


A THEORY OF SOME MULTIPLE DECISION PROBLEMS‘. II 


By E. L. Leamann! 
University of California, Berkeley 


Summary. The theory of Part I is extended to problems in which it is per- 
mitted not to come to a definite conclusion regarding one or more of the ques- 
tions under consideration. Some problems are also investigated in which, from 
a single set of observations, one wishes to answer a number of questions in se- 
quence. Here the nature of the question at a later stage will depend onthe answers 
obtained at the earlier stages. 


9. Decision procedures permitting partial conclusions. It is a common feature 
of all the problems treated in Part I, that a fixed partition of the parameter 
space Q into sets 2; is given, and that one wishes to determine which of these sets 
contains the true parameter point. There are however many statistical problems, 
such as the estimation by confidence sets, in which the possible decisions do not 
correspond to the sets of a fixed partition. In particular, this is the case in the 
field of statistical inference, when the statistician is free to decide how sharp a 
statement he can reliably make on the basis of the observations. We shall show 
in the present section how such problems may be generated by the simultaneous 
consideration of a number of two-decision problems as in Part I, if one suitably 
modifies the interpretation of the decisions involved. 

Previously we were concerned with testing a set of hypotheses H,:6 ¢ w, , so 
that in each component problem the choice lay between the two decisions @ ¢ w, 
(acceptance of H,) and @ ¢w,' (rejection of H,). Suppose now instead that the 
statistician is asked only whether the data reject the hypothesis, and that in 
case they do not, no alternative positive statement is required. The choice may 
then be said to lie between the statements @¢w,' and @ew, = Q. 

This actually appears to be the point of view taken by Duncan (‘Multiple 
range and multiple F tests”, Biometrics, Vol. 11 (1955), pp. 1-42) in his formu- 
lation of this class of multiple decision problems to which reference is made in 
section 1 of the present paper. 

If one considers simultaneously a number of such problems, one is faced with 
a multiple decision problem in which the different possible decisions correspond 
to the statements that a certain number of the hypotheses H, are false, but 
where nothing is said regarding the remaining hypotheses. This is equivalent 
to the statement that the parameter point @ lies in the set 

2; = Nw 
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where the y’s are —1 for each rejected H, and 0 for the others. In the particular 
case that all of the y’s are zero, we have Q; = ©, and thus make no statement 
whatever about the position of 6. As before, it may of course happen that some 
of the formal intersections (9.1) are empty, and we shall then restrict the ’s to 
denote the nonempty ones and require that none of the possible decisions should 
correspond to these empty intersections. If we assume that in the simultaneous 
consideration of a number of problems the losses are additive so that the total 
loss is the sum (or a weighted sum) of the losses of the component problems, and 
if we suppose the losses to be a, for rejecting H, when 6 ¢ w, , b, for not rejecting 


it when @ ¢ w,', and zero in the other two cases, the total loss is again given by 
(2.2) and (2.3) with 


(9.2) Liy = Wiy + 1. 


Since in particular the loss function of the basic two-decision problem is un- 
changed, the associated optimum unbiased procedure of Part I will retain its 
optimum property in spite of the reinterpretation of one of the decisions. It 
now leads to the statements @ ¢ w,' and 6 ¢ w, as X ¢ A;' or X ¢ A, where A, is 
the acceptance region of the best unbiased test of H, . The simultaneous carrying 
out of a number of these tests then still leads to a procedure in which the decision 
d;:6 €Q; is taken when X falls in the set 


(9.3) D; = Nn, A* 


but where Q; is now defined by (9.1) and (9.2) instead of (2.1). 

As has already been pointed out the sets 2; , which define the possible decisions, 
no longer constitute a partition of the parameter space. Instead, they are gen- 
erated through intersections from the class {w,', y eT}, that is, they constitute 
the smallest class that is closed under intersections and contains the sets w,’. 
It may happen that two of these 2’s are equal, 2; = 2; , say, and one would then 
wish to identify the associated decisions. On the other hand, viewing the problem 
as a product one must consider all of the formal intersections (9.1) as distinct. 
Otherwise the definition of the loss function, for example, would become am- 
biguous since the losses resulting from decisions d; and d; when @ is in some 
Q, would usually not be the same even though 2; = Q;. Fortunately, the diffi- 
culty arises in the applications we wish to make only in cases in which it can be 
overcome by a natural further restriction of the decision space. Suppose namely 
that H, and H; are two hypotheses with 


i Cw 
so that the two intersections w,' n ws’ and w,' n ws are identical. It then seems 
reasonable that whenever the data lead to the rejection of H, one would also wish 
to reject the more restrictive hypothesis H; . (In part I this was actually part of 
the compatibility requirement.) With such tests the decisions @ ¢ w,' nm ws would 


never be reached, and the conflict would thus be avoided. For this reason we shall 
eliminate from the list of permissible decisions not only the formal intersections 
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(9.1) that are empty, but also those for which the intersection 2; equals some 
other 0; satisfying yj, S yi for all y. It follows from the general discussion of 
Sec. 7 that (9.3) defines a 1:1 correspondence between families of tests which are 
compatible with this set of restrictions, and decision procedures for the restricted 
product problem. It is useful to note further that for these restrictions the forma- 
tion of restricted products of decision problems also retains the property of 
being commutative and associative, which it obviously had in the problems of 
Part I. 

Let @.(@) denote the two-decision problem discussed above, in which the 
choice lies between the statements 6 > @ and —«# < @ < o, and let @_(4) 
denote the dual problem in which the first of these possibilities is replaced by 
the statement @ < 4. By considering these two problems simultaneously one 
obtains a three-decision problem with loss function 





ds do | di 
| 8 < % —-x <0< mm | O>h% 
bu | 
(9.4) 6<% | O b es 
06 = A a 0 a 
@>6 | a+b b 0 


As an example suppose that @ measures the difference in quality of two products 
which are being compared by an impartial research organization. The decisions 
d, and d, claim superiority for one or the other of the products, while dp states 
that the data are inconclusive and that neither of the two products can be as- 
certained to be better than the other. It is an advantage of such a formulation 
over the more conventional one in which dp is replaced by the statement @ = 4, 
that it enables the statistician to control the probability of error. In the standard 
situation with D, and D, given by 


(9.5) DT <sC; and D:T=Ci 


where P{7T < C,} and Ps{T = C2} are < afor 6 > 0 and 6 < 6 respectively, 
and where P,,{T < C;} = Po,{T = C2} = a, the maximum probability of 
error occurs when 6 = 6, and is 2a. (A very similar formulation was discussed 
by Bahadur, “‘A property of the t-statistic,” Sankhya, Vol. 12 (1952), pp. 79-88.) 

The loss function (9.4) is not appropriate in situations in which a definite de- 
cision is preferred to dp even when @ = 6). A formulation which is more suitable 
for this case is obtained if in the one-sided problems @_(6@:) and @(@) one re- 
places the decisions 6 < 0) and @ > 6 by @ S 6 and @ = @ respectively, so that 
these two component problems are given by 


—-e& << @ 6S % 


(9.6a) 8 0 ‘“ 


60 b 0 


2 
lA V 
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and 

—-7 << 9@< w~ PS} H 
in —_—_——. 
(9.6b) 0< 6% 0 2 
6 = A b 0 


The simultaneous consideration of these two problems leads to a four-decision 
problem with loss table 


ds dy dy d; | 

6S & —x <@< 0] 02% 6@=% | 
aitiasinacesinniamast asiiniirapslbison a | 

6 < % b | a+b | a 
6 = 4% b 2b b 0 


@>% | at+bdb b 0 a 


It turns out that this formulation leads to essentially the same solution as the 
previous one, with D, and D, given by (9.5), decision dy being taken when 
C; < T < Cj, and decision d; not occurring at all. 

We mention finally that still another.problem leads to the same solution, 
namely that given by 


6 < 6 


6= A b’ 

06> 4% : 
The level a of (9.5) is in this case given by 

= b’ —b 
a’ + 2(b’ — b)’ 
10. Partial classification of one or more parameters. (i) Let @ be a real 

parameter and suppose that we wish to determine, as far as possible, its posi- 
tion relative to two given values 6, < 6..A procedure may be generated by 


considering simultaneously the four problems @,(6;), 7 = 1, 2. The resulting 
problem offers the choice between the decisions 


aq:{6< &} = {[0@< A} nm {6 < Or} 

do: {0 < &} = {—-~ <6< wo}n {O < A} 

d3:{0, < 6 < Oo} = {0 < 0} n {0 < By} 

d:i{@>A} = {@>Ajn{—» <b< @~} 

ds:{8 > 62} = {0 > @&}n {@> 4H} 

r{—-02 <A@< wmf ={-—-wmw <A< wm} n{—wm < -—O8< ow}, 
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Here the sets {@ < 6} and {@ > 6} may also be represented as the intersections 
{a@<A}jn{—-2% <6 < wo} and {0> h}m{—«2 < @< }. However these 
decisions are ruled out by the convention of the previous section. 


Suppose now that the tests of the four generating hypotheses have rejection 
regions 


°sa.” 7264. 2G,  TEG 


for H:6 5 &,02 &, 6 S 6, 0 = OG respectively, where the constants are 
determined by 


Po {T < Ci} = Po{T = Ci} = Po{T S C2} = Po {T = C2} = a. 
Compatibility requires that the intersections 
(TsCjn{T2Cj, {(TeCjn{C,<T < C3} 
and {C, < T < Ci} n{T = C3} 
should be empty, and hence that 
C1< Ci, C2<C2, C1 < Cr and C1 <2. 
These conditions are satisfied if a < 3, and if for each fixed C 
(10.2) PT = C}> Pe{T SC} when 6< @. 
According to (9.3) the resulting procedure is given by 
D::T < C; Dy:max (Ci, C2) < T < C3 
(10.3) DeCi;< T<min(Ci,C:) DsT=Cs 
DiGi ST SC; Del: < T < C; 


where < is < when Ci < C. and < when C; = C,. Depending on the sign of 
the difference C, — C; and hence on the distance between 6, and 6, only one 
of the decisions d; and d, will occur. For intermediate values of T the positive 
statement 6; < 6 < 6, will be made only if 6; and 6, are not too close. Otherwise 
such T-values will leave the position of @ in doubt. 

If (10.3) holds, the probability of the procedure leading to a false statement 
never exceeds 2a. For the probability of error is equal to 


Po{T = Ci} S Po, {T = Ci} = for @< 6 
Po {T < Ci} + Po{T = Ci} = 2a for 0 = 
PT 2 Ci} + Po{T = Co} S Po, {T S Ci} + Po{T = C2} = 2a 
for 0 <0 < 6, 
and similarly for 6 = 6. 


In the usual applications 7 is a function of a sample, and as the sample size 
increases JT tends to @ in probability. The procedure is then consistent in the 
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sense that 

P(D;) > 1 for 6€Q,; (¢ = 1, 2, 3). 
Except when @ is exactly equal to 6; or 6, , the probability is therefore 1 that one 
of the three sharp statements d, , d; , or ds will be made, and that this statement 
will be correct. Since 

Po,(D2) = Po,(Ds) = 1 — 2a, 

it also follows that 

P¢,(D2) = Po,(D,) “od 


if one lets a tend to zero as n tends to infinity. 

A slightly different procedure results for problem (10.1) when one is concerned 
with a sample X,, --- , X, from N(6, o’). Here the tests of the four generating 
hypotheses have rejection regions 


(X-—6)/Ss-C, (X—6)/S2C, 
(X-—6)/Ss—-C, (X—&)/S2C, 


and the induced multiple decision procedure is given by (10.3) with 


(10.4) T=X/S, Ci=-C+%, Ci=C+ z 


Both of the decisions d; and d, occur in this procedure with positive probability, 
d; only in cases in which S S (6, — 6,)/C and ds, only when the opposite in- 
equality holds. The remarks concerning error control and consistency require no 
change. 

As an application consider the comparison of two normal populations N(é, o°) 
and N(n, o’) on the basis of samples X,,---, X» and Y;,---, Y,. With 
6 = » — &, 6; = —A, 6 = A, the possible decisions become 


diin — § < —A, dj: —-A<n—E< A, ds:n — & >A, 
dain —&E <A, dyin — § > —A, di —-~m <n -—E< &. 


Here d; states that £ is significantly larger than n, d, that 7 is not significantly 
larger than £, d; that the two means do not differ significantly, etc. The procedure 
is given by (10.3) with T = (Y — X)/S and 
a 2 -es = Cr=C- 7 

Problem (10.1) leads to still another type of procedure in the case of two 
independent Poisson variables, say X and Y, where one wishes to classify the 
ratio p = /u of the parameters of the two distributions with respect to two 
values pi < po. Here the tests of the four generating hypotheses p S p;, 
p = pi(t = 1, 2) are carried out conditionally, given the value of X + Y. The 
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conditional distribution of Y given X + Y = m is the binomial distribution 
b(p, m) corresponding to m trials and probability p = u/(A + uw) of success. 
The conditional situation is therefore of the kind discussed at the beginning of 
this section, and leads to the procedure (10.3) with 7 = Y and @ = p. 

Whether d; or ds occurs depends here on m, d; being associated with large 
values of m. To see this, let F,,.(t) denote the conditional cumulative distribu- 
tion function, given X + Y = m, of Y + U where U is independent of Y and 
is uniformly distributed on (0, 1). Then C; = C’(m) and C, = C,(m) are deter- 
mined by 


(10.5) Fy,.m(C2) = 1 — Fp,m(Ci) = a. 


We shall show that Ci(m,) < C2(m,) implies that also Ci(m2) < C2(mz) for all 
Mm, > m. Since F,,.m(C2) = a, Fy,.m(C2) is the power of the most powerful 
level a test for testing pe against p: in a binomial distribution b(p, m). If 
Ci(m) < C2(m) it follows from (10.9) that in the case of m trials this power is 
greater than 1 — a; but then it must also exceed 1 — a for m2: > m, trials so 
that 


Fp; .mz(C1) = 1—a < Fp, .m,(C2), 
and hence Ci(mz) < C2(mz). 

It is interesting to note that in all of these problems the choice between 
decision d; and d, depends on the distance between @, and 6 relative to the 
amount of information the data contain for the problem. 

While the classification of 6 with respect to a single value 6 , or two values 
6, < 6, are the most interesting cases, let us consider briefly also the problem 
of classifying @ with respect to a countable set of values --- < 6.2 < 6.4 < & < 
6, < --- . This is generated by the problems @..(0;), i = 0, +1, +2, --- . Sup- 
posing that 0; ~- + asi— +o and letting 0, = —~, 0, = ~, the possible 
decisions consist of the totality of statements 0; < @ < 6;. The decision 
6; < @ < 6; corresponds to the individual decisions that 6 < 6 fork = j and 
6 > 6 for k S i, and that the position of @ is left in doubt with respect to the 
points ® with i << k < j. 

The limiting case of this problem, in which one wishes to classify @ with re- 
spect to all possible values of @ , is obtained by considering simultaneously the 
problems @..(@) for all 6) . The possible decisions then consist of the totality of 
statements @ < @ < 6, and if a, = a, b, = b for all y, (9.3) yields precisely the 
standard confidence intervals for @ with confidence coefficient 1 — 2a. The loss 
function resulting from the additivity assumption is in the simplest case” 


(a+ b)(@— 0) + b6-—86) if 6<8 
(10.6) b(@ — 9) if @<6<6 
(a+ b)\(@—6)+b6—8) if o>. 


2 A loss function of this type was suggested by Wolfowitz [4]. 
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More generally one may replace 6 — @ by Sr du(@), and similarly for the lengths 
of the other intervals. Unfortunately the condition of unbiasedness becomes very 
difficult to interpret in the present problem, and it is doubtful that for the 
standard distributions the procedure is unbiased with uniformly minimum risk, 
as is the case with the other problems of this section. 

It should be pointed out that all these procedures concerning a single real- 
valued parameter @ can be obtained from the standard confidence intervals con- 
cerning 6. When classifying @ with respect to 6, and 62 for example, one can state 
6< Hif 6 < 6,and 6, < 6 < if & < 9 < 6 < 6. On the other hand, if @ < 4, 
< 6 < 62, only the conclusion @ < 6, is possible, the relation of @ to 6, being left 
in doubt. This approach, however, does not yield any optimum properties for 
these procedures, and does in fact not carry over to problems involving more 
than one parameter. 

In Example (iii) of Sec. 3 we considered the comparison of a number of normal 
populations N(@;, 0°). The consistency difficulties that occurred in combining 
the decisions 6; S 6; for different pairs (i, 7) disappear if one treats the problem 
from the present point of view, which is exactly that from which the problem was 
treated by Duncan, “Multiple range and multiple F-tests,’’ Biometrics, Vol. 11 
(1955), pp. 1-42. For each pair (6; , 6;) the possible decisions are now 06; < @;, 
6; > 6;,and —« < 6; — 6; < © instead of the earlier 6; = 0; . Since there is 
no loss in omitting the vacuous statements, the total decisions consist in the 
ordering of some but not necessarily all of the pairs (0; , 6;). In the case of three 
populations, for example, the following decision types will occur: 


(a) <0; < &, (c) 0< 0;,0; < 6 
(b) 0; < & , 0; < &, (d) 0; < 0; 
(e) no statement. 


We shall now show that for this procedure the probability of error can be 
controlled through the choice of a, and that its maximum is in fact attained when 
all of the @’s are equal and is then given by 


|X, — X;| - 
(10.7) pla Fl > Ci; for some init. 


In the particular case of equal sample sizes this becomes 
(10.8) pymer | — Zul > o| 


where C is the cut-off point of the one-sided t-test at level a.° The probability 
of error is the probability measure of the set 


(10.9) U {ti2 ¥ > ou} + uj te; # < cu 


* For a table of the values C for which this probability is 1 per cent or 5 per cent see 
[2], where also a number of related tables are discussed. 
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where the first union is extended over all pairs (i, 7) for which 6; S 6; and the 
second union over all pairs (k, ¢) for which 6¢ => 6. Let XT = X; — 6;, so 
that the XT are distributed as N(0, o’). Then 6; = 6 and X; — X; > C,;8 
imply X} — XT > C,,S, and 6¢ < ® and X¢ — X, < CyeS imply X? — Xt < 
C,tS. The probability of the set (10.9) is therefore not decreased if one replaces 
the X’s by X*’s, nor if one extends the union over all pairs (7, 7) and (k, ¢). But 
this is equivalent to evaluating the probability of (10.9) under the assumption 
that all of the @’s are equal, which completes the proof. 


11. Decision problems with simple loss functions. As a tool for proving the 
procedures of Secs. 9 and 10 to be unbiased with uniformly minimum risk, we 
shall now give an extension of Theorem 2 (Sec. 7), which is valid for a rather 
general class of decision problems. We shall say that a decision problem @ is 
simple if it satisfies the following two conditions. 

(a) Its loss function W(@, d), considered for fixed d as a function of 6, has sets 
of constancy independent of d, that is, there exists a partition II of the parameter 
space 2 into sets 0, , ie], such that W(@, d) is independent of 6 on each 0; . 
We may then write 


(11.1) W(0,d) = Vi(d) for @€9;. 


(b) With respect to some convergence notion in Q, 6, — % implies E,,y(X) — 
E,.4(X) for each integrable y or, if all of the functions V; are bounded, for each 
bounded y. 


We shall require the following properties of simple decision problems. 


(i) For any procedure 6 the risk function R;(6) is continuous on each set of the 
partition 11. The risk function is given by 


(11.2) R(0) = E,V{5(X)| for 6€90;. 
Hence 6, , % € 8; and 6, — 4% imply 


Ri(On) = Ee, Vi{5(X)] > Eo,Vs5(X)] = Ra(Go), 
as was to be proved. 
(ii) Unbiasedness of a procedure 6 implies the continuity of its risk function. 
By (i) it is enough to prove this for boundary points of Il. Let 6 be such a 


boundary point, and suppose that 4 ¢ 6; and 4 = lim,.,, 6, with 6, ¢ @;. Un- 
biasedness implies 


Eo,V {6(X)) <= Eo,V {5(X))], 
Ee,V{5(X)] = Es,V {5(X)] 
and hence also 
Eo,V{5(X)| 2 Ee, V i{6(X)). 
It follows that 


Rs(60) = Eo, Vs{6(X)) = Eo, V {5(X)] = limy.,, Rs(,). 
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(iii) Any restricted product of simple decision procedures is again simple. 

We can now prove the main result of this section, which provides a sufficient 
condition for unbiasedness of a product procedure to imply continuity of the 
risk functions of all of the component procedures. 

(iv) Let & be a restricted product of a finite number of simple decision prob- 
lems @,, and suppose that the partitions Il, of the component problems into 
sets 0;, , i ¢ I, satisfy the following condition. 


(x) Let 0:4, , Ojy, be any two sets of Il,, with common bound- 
ary points, let @ be any such point, and assume without 
loss of generality that #) ¢ 0;,, . Then there exists a sequence 
of points @, ¢ 0;,, such that 0,— 4 and 


6, ~ 6(Il,) forall y ¥ yo, 


where @ ~ 6’ (II) indicates that the two points lie in the same 
set of the partition. 
Under these assumptions, if the risk function R;(@) of a product procedure 6 is 
continuous, so are the risk functions R;,(@) of the components 6, of 6. 
To see this, let yo be any fixed value of y, % any boundary point of II,, , and 
{6,} the sequence guaranteed by («*). Since R;(6) = » at R;,(6) is continuous, we 
have 


rs R:,(8@n) > > Ri, (90). 
Also, for each y ¥ 7 all of the points 6, lie in the same set of the partition II, 


with 4 , so that by (i) 
Ri(0,) — Rs(%) forall y # yo. 


It follows that R(6,,,8.) — R(6,,,40), as was to be proved, where we have written 
R(6, 6) for R;(4). 

It is convenient that (*) depends only on the partitions II, , not on the values 
the loss functions W, take on over these partitions. For applications it is further 
important to note that (*) may be weakened slightly. Let A,, be the set of com- 
mon boundary points of 0;, and 0;, that belongs to 0;, . Then in order to ensure 
(iv) it is sufficient if (*) holds on a dense subset of each A,;,. This is an im- 
mediate consequence of (i). 

Consider now any problem @, which is a restricted product of a finite number 
of problems @,(8,), and which satisfies (+). For the component problems con- 
tinuity of the risk function is equivalent to similarity on the boundary at level 
a, = b,/(a, + b,). Hence a product procedure 6 uniformly minimizes the risk 
among all unbiased procedures of ® provided each component procedure 6, uni- 
formly minimizes the risk among all procedures of @,(0,) that are similar on 
the boundary at level a, . Since @,(6,) is formally equivalent to the problem of 
testing @ S 6, or @ = 6, , this is the case in particular if the possible distributions 
of the observable random variables constitute an exponential family, and the 
procedures 6, are the best unbiased tests of the hypotheses in question. Under 
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these conditions it is then only necessary to verify (*) in order to establish the 
desired optimum property for the resulting 6. 

As an example consider problem (9.4), which is generated by @(6). Here 
@_() induces the partition 6 S 6), @ > 6, with 4 as its only boundary point. 
Let 6, be any sequence of points greater than and tending to % . Then all the 
points @, and lie in the set 6 = 6, and hence 6, ~ 6 with respect to the 
partition induced by (6), which completes the verification of (*). The argu- 
ment is exactly the same for problem (9.6) and (10.1). 

In the example of Sec. 10 leading to procedure (10.7), a typical partition is 
0: S 6, 0 > 6. Attention may be restricted to boundary points 6° with 
coordinates 


Bie COM cM =e OM <M <.--- < of... 


For these, (*) is satisfied by the sequence of points 6” with coordinates 
0s” = @{° for i ¥ 2, and 6:"” between 6{” and 6§)” and tending to @{””. 


12. Consecutive decisions in a single experiment. The multiple decision prob- 
lems treated in the previous sections were generated by the simultaneous con- 
sideration of a number of simpler component problems. We shall now suppose 
that these separate problems arise not in parallel but in sequence. A single sample 
is available for investigating a number of questions that are potentially of in- 
terest and are taken up one by one. Whether a given question is relevant, or 
which of a number of possible alternative formulations is appropriate at a certain 
stage, depends on the decisions reached up to that point. 

(i) As an example suppose that independent variables X,,--- , X, from a 
normal distribution N(é, o”) are measurements on an experimental batch of a 
new product of quality ¢. The product is of no interest unless § > &, so that 
one will wish to test first of all the hypothesis Hi: S & . If the quality is found 
satisfactory, that is, if H, is rejected, it becomes necessary to investigate the 
variability of the product. One will then test H2:¢ = oo , and in case this hypothe- 
sis is accepted one will try to reduce o, for example by using less variable ma- 
terials. The problem of testing H, arises here only in case H, is rejected. 

(ii) Suppose that two treatments are being compared on a number of different 
categories of patients. Let the observed effect of treatment i(i = 1, 2) on the 
kth patient of the jth category be distributed as N(&,; , o°) where 


bg = a tds t us t+ vg (Ded = DBs = Lives = Desvis = 0). 
Here \; is the main effect of treatment i and »,; the interaction between the 
ith treatment and the jth category. One may believe in the possibility of the 
interactions being negligible and hence wish first to test the hypothesis H,:»;; = 0 
for all 7, 7. In case H, is accepted the \’s are the objects of primary interest, and 
the problem becomes that of deciding whether \. — \; is <, =, or > 0, or to 
estimate this difference either by confidence intervals or by a point estimate. 
On the other hand, if H; is rejected one will be concerned less with the over-all 
effects of the treatments which is measured by the \’s than with the treatment 
differences 2; — £1; for each category. 
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More generally, let there be given a first problem ©’, in which the possible 
decisions are d; , --- , d» and the loss function is W’(@, d). If decision d, is taken, 
a second problem @/ arises with possible decisions d‘j(j = 1, --- , ns) and loss 
function W7(@, d). The combination of these two problems leads to a two-stage 
problem with decisions d;; = (d; , d7;). One may of course continue in this man- 
ner and suppose that decision (d; , di;) gives rise to a further problem @%j’ with 
decisions d;;; . However, it is enough to treat the case of two levels since the 
discussion then extends immediately to the more general situation by induction. 

In specifying a loss function we shall assume that even if a wrong decision is 
taken at the first step, so that the second problem is not the most appropriate 
one or perhaps need not have been considered at all, it is still desirable to do 
as well with respect to it as is possible. Thus in example (ii) above, if one has 
incorrectly decided that the interactions are negligible, one will in the estimation 
of 6 — 6, still wish to obtain as good an estimate as possible, and analogously 
if H, has wrongly been rejected. Whether the assumption holds in Example (i), 
that is, whether one would wish to control the variability of the new product 
after having mistakenly judged its quality to be satisfactory, appears to depend 
on the circumstances of the problem. 

With this assumption, a natural loss function for the compound problem is 


(12.1) W(6, di;) = W’(0, d;) + W2 (6, di;). 


The possibility of not considering a second problem in case a certain decision 
d; is taken at the first step, is included in this formulation. One need then only 


take as problem @{ the vacuous decision problem, that is, set n; = 1 and 
W7(0, di) = 0. Suppose in particular, as was the case in Example (i), that a 
second problem occurs only for one of the decisions of the first stage, say dj . 
The possible decisions of the compound problem are then 


dij = (di, dij), j= i,-* 
and d, = dz,-++,dm = dm, 
and the loss function is given by 
W(0,d:) = W’(8, d:), i = 2,-+-,m; 
W(0, di;) = W'(6, di) + W"(@, dis), g=l,-+-,n. 


Returning now to the general case, suppose that there exist a satisfactory 
procedure 8’ for &’, which takes decision d; when X ¢ D; , and that the problems 
OO ,--:, @ are all different. It then seems natural to retain 4’ as first step of 
the compound procedure, and to consider the problems at the second level 
relative to the circumstances in which they occur, namely conditionally given 
that X « Di, --- , X ¢ Dj, respectively. Suppose further that for each i = 1, 

- , m there exists a satisfactory procedure 8; for ®7 when the distribution of 
X is the conditional distribution given X ¢ D; . Such a procedure consists of a 
partition of the new sample space D; into regions D7; in which the decisions 


(12.2) 
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di; are taken. Together, the sets Di;(j = 1,---,m;i = 1,---, m) forma 
partition of the original sample space and a solution of the compound problem, 
with decision d;; = (d; , di;) being taken when X ¢ D7; . One can of course again 
include in the formulation the possibility of ruling out some of the decisions 
d;; , and the resulting compatibility questions can be treated exactly as before. 
However, this possibility seems to be less important in the present context and 
in order not to complicate the discussion unnecessarily we shall assume that no 
restrictions are imposed on the compound decision problem. 

As an application consider the case that m and the n; are equal to two, so that 
each of the component problems is one of hypothesis testing. Suppose that the 
hypotheses in question concern the parameters 6; in an exponential family 


dPo(x) = C(O, --- , &)em**™*™ dv(z). 


There then exist uniformly most powerful unbiased tests of the hypotheses 
6; < 6; and more generally of > c,6; S co. (See, for example, [2].) Since after 
truncation on a fixed set D the family of distributions P, retains its property of 
forming an exponential family, such optimum unbiased tests will in particular 
also exist at the second stage after a preliminary test of significance has been 
performed as a first step. This will however not coincide with the standard op- 
timum test for the corresponding problem without truncation. 

We have assumed so far that the problems occurring at the second stage are 
all distinct. Suppose now instead that @{ is appropriate when decisions 
d;,---, d,, are taken in the first problem, that © corresponds to decisions 
d,.41,°°*, G,4e,, etc. One would then consider the problems @7 , 02, --- 
conditionally given that X «Dj + --- + Di,, Xe Dia t--+ + Dian, '', 
and otherwise proceed as before. If one is dealing in particular with the product 
of two decision problems so that all of the problems at the second level are the 
same, one considers this common problem ©” given that X « Dj + --- + Dh, 
that is, unconditionally. The procedure therefore reduces to the product of the 
procedures for ®’ and &”, so that the present theory agrees with that given earlier 
for products of decision problems. 

Unfortunately the properties of the conditional procedures considered in the 
present section are not as satisfactory as of those discussed in the earlier parts 
of this paper. To be specific, let @ and @Y , --- , P» define a two-stage problem, in 
which the components @{ of the second stage are distinct. The risk function of a 
procedure 6 with components 6’, 8 , --- , &» is then 


(12.3) Rs(0) = Rv (0) + >oTer Po(Di)Retya(8) 


where the notation R;;\s is used to indicate that this risk component is computed 
conditionally given X ¢ D{ and hence depends on é’ as well as on 8; . 

It is clear from (12.3) that unbiasedness of 6’ and 4; , --- , 6» implies that of 
6. Also it is again true for most problems of interest that unbiasedness of 6 
implies either unbiasedness or at least similarity on the boundary for 8’, & , 

- , 6. However, the basic comparison of two procedures in terms of their 





560 E. L. LEHMANN 


components is no longer simple. In particular, 
Ry(0) S Ry (8), — Ragyar(0) S Ryziy(0) 


does not in general imply R;(@) S R,(6). If one wishes to minimize R;(@) then, 
given 8’, one must select 6; to minimize Rs-;s(). But the best choice of 8’ is not 
necessarily that which minimizes R;-(@) since the choice of 4’ influences not only 
R; (6) but also the second components of (12.3) and in particular the conditional 
risks R;,)s . As a result of these complications it turns out that for the problem 
under consideration there usually does not exist among the unbiased procedures 
one that uniformly minimizes the risk. Of the procedures, the components of 
which have this optimum property we can only say that they are unbiased, 
and within the class of all unbiased procedures admissible. 


13. Some examples of conditional procedures. Although we have found no 
satisfactory justification for the procedures discussed in the preceding section, 
they are rather natural from the Neyman-Pearson point of view, and we shall 
briefly illustrate them here with a few examples leaving a more detailed discus- 
sion and comparison with alternative procedures for a later paper. 

(i) The problem mentioned at the beginning of Sec. 12 is concerned with 
testing, on the basis of a normal sample, the two hypotheses Hi: S & and 
Hz:0 = oo, where Hy is assumed to be of interest only in case H, is rejected. 
If without loss of generality we put & = 0, the best unbiased procedure for testing 
H, is Student’s t-test with rejection region 


(13.1) X/S=C 


and size a, = b;/(a,; + b,). With this as first step, the condition of unbiasedness 
implies in the usual way that the rejection region R, of H, must satisfy 


(13.2) P.,(R2| S S$ #/C|\#)2a. 


Applying the fundamental lemma of Neyman and Pearson one sees that the 
uniformly most powerful unbiased conditional test of H, has a rejection region of 
the form 


(13.3) S’ < f(X). 
Here the function f is defined by 


f(u) u2/c2 
(13.4) I De,(z) dz = ax [ Pe, (z) dz, u> 0, 
0 0 


where p,, is the probability density of S’ when ¢ = ao. 


The resulting compound procedure then decides between the three possible 
conclusions 


d,:& S &—the new product is not of satisfactory quality, 
dz: > &, ¢ 2 oo —the quality of the new product is satisfactory but its vari- 
ability must be reduced, 


d;:& > & , ¢ < oo —the new product is satisfactory with regard to both quality 
and variability, 
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as the sample falls into the corresponding one of the regions 
D:X|SsC, 
D:X|S>C, S2 fk) 
Dy:X|S>C, SS < fk). 


These decision regions are illustrated for the case n = a = .05 in 
the figure. 

(ii) As a somewhat more complex example consider two treatments which are 
being compared on a number of different categories of patients. Let the observed 
effect Y,, of treatment 7 (i = 1, 2) on the kth patient (k = 1, --- , n) in the 
jth category be distributed as N(é;; , 0”), and let 


Es =antrutauyt vis (Dos ds = ins = Dos 5 = > 5%; = 0) 


where \, is the main effect of treatment i and »,;; the interaction between the 
ith treatment and jth category. One may here wish to test first the hypothesis of 
no interaction 


Ay:¥; = 0 for all 7, j. 


If H, is accepted one will be interested in the difference of the \’s and wish 
either to test it or alternatively to estimate it by confidence intervals or point 
estimate. We shall here suppose that we then want to test 


Hades — du SO. 


On the other hand, if H; is rejected one will usually be concerned less with 
comparing the over-all effects of the two treatments, which is measured by the 
’s, than with a comparison of the treatment effects &; and £1; separately for each 
category j. In particular one may be interested to test the set of hypotheses 


35:2; -h; = (Az + ¥25) — (Ax + ¥13) S 0. 
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Although there exists no uniformly most powerful unbiased test of H; it seems 
natural to start out with the standard test of this hypothesis which is uniformly 
most powerful invariant, and is given by the acceptance region 


(13.5) Si/Si = C 


where 
S = DUD (Via — Vis)’ 
Si= > (¥%u. — Yi. — Yu. + Yu. 


The hypothesis H, should then be considered conditionally, given Si/Sj < C. 
A routine application of the theory of unbiased tests and similar regions’ shows 
that a necessary condition for the rejection region R, to be unbiased is 


Phun, {Re | si, 8 + 82, Si/So SC} = a 


for all values of s; and sj + 8s: where 
St = Doin (¥i.. — Yu)? = (¥2.. -— Vi)”. 
If one puts 
U = (¥2..— ¥1..)/S, V=So+S:, W = (So + S3)/Si, 


it follows from the fundamental lemma that the uniformly most powerful (con- 
ditional) test of H, has the rejection region 


U = k(V, W) 
where k is determined by 
Py,.—,{U 2 k(V,W)|V =»,W=w and US Cw — 1} a. 


Now when A; = 2, the variable U is independent of V so that k depends only 
on w, k(v, w) = f(w) say. The rejection region then becomes 


(13.6) U 2 f(w), 
where f is determined by 
Py,-.,{U < f(w) | U? < Cw — 131 — a). 


Since acceptance of H, is equivalent to u’ < Cw — 1 it implies in particular 
that 0 < Cw — 1. The defining condition for f may therefore be written as 


f(w) (Cw—1)1/2 
(13.7) | pu(u) du = (1 — ax) [ pu(u) du 


—(Cw—1)1/2 (Cw—1)1/2 


where py(u) is the probability density of U when \; = Az, that is, essentially, 
the density of a ¢-distribution with 2m(n — 1) degrees of freedom. 


4 The proof requires the easily shown fact that the family of noncentral x?-distributions 
with a fixed number of degrees of freedom is boundedly complete. 
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Let us now consider in a similar manner the hypothesis 
Hu:(\2 — As) + (en — vn) S 9, 


conditionally given that Si/Sj > C. Let \; and #,;; be the least squares esti- 
mates of the corresponding parameters so that 


Si = doin Dd jar F455 
and let 
Z. = 30h2 — Ma — bn + Onn) 
Zz = 3(0\2 — ba + om — on) 
S? = Si — (6h — ou)’ = Si — (2, — 2)’. 
If ¢; = E(Z,), the hypothesis becomes {2 S 0, and unbiasedness of the condi- 


tional test of Hy, with rejection region Ra implies 
Pr no{ Ra | 21, 8 + 22,8; and Si/S>C} =an. 
The condition S; > C'S} is equivalent to (Z. — Z:)” > C'Ss — S81’, which may be 


rewritten as 
= Zi ) CZ} C _ pt _ git 
(4-2 + arep? ree ™ 
where 7” = S; + Z3 . This is satisfied for all values of Z, if 
y 2 
iJ C4 <4 
1+C (i+ c)?~ 


and otherwise for the values of Z. , for which either 
ay ie + 4/ , 2s Zi = x,(2, 5) 
1 (1+ C)7 i+c Ff ai+cr ht 


a te g/ O_O (2,58) 
T (1 + C)T 1 + € T? at+cy? T ’ T . 


(13.8) r — ss; - 


Since Z,/T is independent of Z,/T and S;/T when ¢ = 0, the uniformly 
most powerful unbiased test of Hs given Si/Sj > C is then given by the re- 


jection region 


: (3) 
pl canine. Sane 
T =K(2,5 


with the function K defined as follows. When (Z;,/t, S;/t) satisfies (13.8), 


we have 
K(2,%)=x 
es 
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where 
1 
i pr(?) dr = an, 
K 
pr(r) being the probability density of 
R= 2 = —_802 — ds — os + bn) 
. Ff Saran ts... ae 
Vs? + 2(r2 — Ay + ba — iu) 
when ¢ = 0. For all other values of (z:/t, s:/t), K(z:/t, s1/t) is given by 


1 Ky, (2;/t),(e{/t)) 
| Pe(r) dr = an | pe(r) dr 
K((21/t),(s{/t)) Ly 


1 
+ / pr(r) ar). 
Kq((22/t),(e{/t)) 


(iii) As an example involving more than two stages let us consider the determi- 
nation of the degree of a regression polynomial. Let Y; , --- , Y, be independently 
normally distributed with constant variance o’ and means 


n= E(Y;)) =~eatenutit-:-- ter (¢ =1,---,n). 


We shall assume that a polynomial of degree s will in any case be adequate for 
our purposes, and wish to determine the smallest degree r S s that would also 
be satisfactory. It is convenient for this purpose to express the regression poly- 
nomial in terms of the orthogonal polynomials P; defined by 


(1 for i=j 
Dota Pi(a%x)P (ae) = . 
\0 for ij. 
Writing 
ni = Ce + CoiPi(xi) + +++ + oP, (2), 


we test successively the hypotheses Ho:co = 0, Hi:c,: = 0, --- , continuing as 
long as no rejection occurs. 

The problem can be stated in the following canonical form: X,, --- , X, and 
So are independent variables, with X; being distributed as N(é;, 0”) and with 
Sc/o° having a x’-distribution with n. = n — s degrees of freedom. One wishes 
to test consecutively the hypotheses H;:&; = 0, (i = 1, ---, 8) continuing as 
long as no rejection occurs. Slightly more generally one might have variables 
Xis:N(&i;, 0°), G = 1, «++, m 5% = 1, --- , 8) and So and consider consecutively 
the hypotheses H;:§;; = 0 for 7 = 1, --- , n;. Invariance reduces the problem 
to the statistics Ss and S; = } 3‘, Xi;, which are independent and where 
Si/o* has a x’ distribution with n; degrees of freedom when H; is true. 

Let 


S? S Ss? 
Vi ’ W; - V; (1 + ) = Si + So 


9 7 
Sj-1 U 7 


72 
Si-1 
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We shall now show inductively that the best unbiased conditional test of H; , 
given the acceptance of H,, --- , Hj, , has an acceptance region of the form 


(13.9) Us SIS(Wi, Vin, -**, Vs). 
The functions f; are defined by 
(fi(wi) = C, 
(13.10) [ neat Oe ok "Weis 
where p; is the probability density of Si/S> when H, is true, h; is defined by 
h = C, 


pdr) dr (i = 2) 


0 


(13.11) 
hivy +++ , v2) = minlvhhya(vin, «++ , v2), gelvs, +*> , 02)) (i = 2) 


and g; is a function taking on values in the space of u; and defined by 


= fi'(u, Ui-1, **° , V2) 


L+= 
Uu 


(13.12) U = gildi, +++ U2) > 0; 


where f;’ denotes the inverse of f; considered for fixed values of vi1,--- , 
as a function of u. It is seen that (13.10), (13.11) and (13.12) define f; in terms 
of fia . 

The best unbiased test of H, has the acceptance region u, S C and hence 
satisfies (13.9). Suppose now that the acceptance regions A;, --- , Ai: have 


been shown to be given by (13.9). To prove (13.9) for A; we need the fact that 
(13.13) Ay Neh Aji = {(u, v)Ug-4 < hy-y(vin es v2)} 


and since this is true for i = 2, we may accept (13.13) as part of our induction 
hypothesis. The condition of unbiasedness implies that A; should satisfy 


Px, {Ay | 8 + 8, 81,°°', %-a and Uy, S ha(Vin,---, V2)} =1— a. 
Now U;-, S Aj-1 is equivalent to y 
U; S Waia(Vin,---, V2) -— 1 
so that the best unbiased acceptance region for H; is 
U; = K(Wi, Si, ++, Sin) 
where 
Pu {Us S$ Ki|si+8i,81,°-:, 8-1 and U;<S WAA(Vin, ---,Vs)—1)} 
=l—«a. 


Since U; is independent of W;, Vi, ---, V2 when H; is true, it is seen that 
K; depends only on w;, v%j-1, +++ , v2 and that A; is given by (13.9). 
To complete the proof, it only remains to verify (13.13) for the set 
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A,n--+n A;,; which is the intersection of 


Ai n-+*N Aji Ua S hia(vir es V2) 
and 
A;: Uy + filwi 9 Vids °°* y v2). 


The inequality describing A; is equivalent to 


fi"(ui, 1-4, +++ m2) S wy = 0; (1 + 2) 


U; 
and hence to 
A;: us S gilts, ->* , U2) 
by (13.12). Since the inequality describing A,n --- m A,_; is equivalent to 
Ayn --+N Ayyiuy S vjhys(vin, +++ , v2), 


the intersection A,;n ---n A; is given by (13.13), as was to be proved. 

A closely related problem arises in the study of components of variance when 
one is dealing with a hierarchical classification. Here the problem reduces to 
independent statistics Si, S?,--- where Si/o; has a x’-distribution with n, 
degrees of freedom. It follows from the underlying model that o; S o. S ---, 
and one is interested in testing successively the hypotheses Hi:o2 = a1, 
Hz:03 = o2,°**, continuing only if all the previous hypotheses were ac- 
cepted. If H,, --- , H; are true, the distribution of Si, --- , Si is the same as 
in the preceding problem, and it is easily seen that exactly the same proce- 
dure is applicable to the present situation. 


14. Minimizing the maximum risk. For problems of the kind discussed in the 
preceding section unbiasedness is closely related to the minimax property. We 
begin by considering the two-decision problem of testing a hypothesis H:0 ¢ w. 
Let dy and d,; denote the decisions of accepting and rejecting H, and let the losses 
of false rejection and acceptance be a and b respectively. Then we have 

(i) A necessary condition for a procedure 6 to be minimax is that it be un- 
biased. 

(ii) This condition is also sufficient if the probability P(A) of any set A is 
continuous in @ and if the common boundary of w and w is nonempty. 

To see (i) note that unbiasedness of a procedure 6p implies 


b ‘ , a . —} 
‘— (do) < — 
(14.1) P4(d;) = h for 6 € @, F (do) b for 6 ¢ oF 


Since the risk function of any procedure 6 is given by 


7 {aP¢(d)) for Oe 
' \ bP (do) for 0ew , 


(14.2) R3(8) 
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do satisfies 


ab 
a+b 
Any minimax procedure 6, must then also satisfy (14.3) and hence by (14.2) 
must also be unbiased. Suppose next that the assumptions of (ii) hold, let 6 


be a common boundary point of w and w~, and let 59 be unbiased. Then (14.1) 
and P»(do) = 1 — Pe(d;) show that 


(14.3) sup R,,(@) < 


P»,(d1) and hence R,,(6) = ab/(a + b). 


oat 
ots 


Thus for any unbiased procedure 4) we have 


ab 
(14.4) sup R;,(6) = a 
By (i), this relation holds in particular for any minimax procedure, so that 
ab/(a + b) is the minimax risk and (ii) follows from (14.4). 

We shall now extend (ii) to the case of a number of hypotheses H;:&;(6) € «; 
(¢ = 1, --+ , s) to be tested in sequence, where each hypothsis H; is tested only 
if a particular prescribed chain of decisions has been reached on H,, --- , Hi. 
For these problems we have 

THEOREM 3. Every unbiased procedure minimizes the maximum risk provided 
(i) Po(A) is a continuous function of 6 for each A, (ii) the 2° intersections Ni.1 wi", 
(each x; = +1), are all nonempty, (iii) there exists at least one parameter point 6 
which lies on the common boundary of w; and w;’ for all i, (iv) the losses a; and b; 
for falsely rejecting and accepting H; satisfy condition (14.10) below. 

Proor. Let us write a;’ for b; , let d; and d;* denote the decisions of rejecting 
and accepting H; , and let y; = +1 if His: is considered in case H; is rejected, 
and y; = —1 in the contrary case. If @ eM {_; w;‘, the risk of a procedure 6 is 
then given by 


R,(@) = ai'P(di') + Pdi’) {az’P(d2 | di’) + P(d? | di’) 
[as*P(ds* | di'd? + --- ]} 
= aji'P(di') + a3*P(di')P(dz" | di’) 
+ a3*P(di')P(d3* | di')P(d3" | di'd3*) + ---. 


By comparing each of these expressions for which z; = 1 with the corresponding 
one for which x; = —1 but all the other z’s are the same, it is seen that un- 
biasedness implies and hence is equivalent to the conditions 


» b; 
Pid;| di --- dis) s xt for 6 € w,; 


Pedy | dv +++ dts) = > j, for 0 wr 
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fori = 1, ---, s. Putting r; = a,b; / (a; + b;) we see that the risk function of 
any unbiased procedure satisfies 


sup R;(6) r+ ot an’ a? r 
ener a; + bid, + by * 


y 
1 
re 
a + by 


a‘! ani! 
a + by Qs-1 + by-1 


It now remains to show that r* is the minimax risk. 
Consider any procedure 6, and let 


a; = Po(d: | dv --+ di3'). 


Then it follows from assumptions (i), (ii) and (iii) that 


r= 


(14.6) sup R,(@) => max [(a,a:)"' + af'(a, a2)? + --- + at! --+ alti'(a, a)” 


21,°°*,2 


since the 2° expressions in brackets are the possible values of lim R;(@) as 6 tends 
to % . Let us now minimize the right-hand side of (14.6). Given any a, «++ , @-1, 
%1,°**, % 1, the maximum of the pair of expressions 


(ayy) + --- tal! - ++ aht3?(Q,_105-1)°°"' + af’ +--+ alti’ (aa), 
z LM Vs—2 j , ical ~1 
(qyay)' + +++ + al! +++ asd" (Q,10, 1)” ‘+ at’ +++ a5'(a,a,) , 
is clearly minimized by minimizing 


—1 
max|[d,a, , a, 


(1 — a,)]. 

This gives a, = a; where we put 

(14.7) sii ios, 
a; + a; a; + b; 

so that 

(14.8) aiar = (ajar). 


We can now proceed inductively. Suppose that it has already been shown for 
any fixed a, --- , a; that the right-hand side of (14.6) is minimized by a; = a} 
forj = i+ 1,---, s. Consider now the minimization of the maximum of the 
quantities 


(ayoy)? + +++ + at! +++ ati! (aya) + af! +++ af’ [(asysatys) 
+ (atys)”* (az4saty2)**? + 
(aay)! + +++ tay! --+ atis'(aia;) + al! --+ aff! 
(Gi 4s0ty1)" +} (ortn1)”**? (Gigcreye)” P44... | 


where the expression in brackets by (14.8) is independent of the z’s and hence 
equal to 
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(14.9) kk; = Oi 410741 + (at41)** a; 420042 + +66 (aty1)"**? powne (as-1)"*“"a,a3. 


Eliminating the common terms at the beginning, and the common factor 
aj’ --- a‘>' of the terms involving a; , we see that it is enough to minimize 


max|aja; + kia’ , biaz’ + kas"). 


This is achieved by equating the two quantities, which gives a; = at , provided 
the coefficient of a; in b; + (k; — b;)a; is <0 in case y; = 1, or that the coefficient 


of a; in aja; + ki(1 — a) is >0 in case y; = —1. Thus the right-hand side of 

(14.6) is minimized by putting a; = a? for all i, provided 

(14.10) @isrertgs + (arty) asyeatye2 ees (aga) *t = ++ (aa) at 
<a;". 


Putting a; = a; in the right-hand side of (14.6) one then sees that the 2° quan- 
tities in brackets become equal, and that their common value is r*. Thus, for 
any procedure 6, we have 

supR;(@) 2 r* 


and the desired result now follows from comparison with (14.5). 
Coro.iary. The conclusion of Theorem 3 holds if (14.10) is replaced by either 


(14.11) yi = lforalli and } 2h2---2>b, 
or 
(14.12) yi = —lforalli and qa2a2-:---24,. 


Proor. It is necessary only to show that each of the conditions (14.11) and 


(14.12) implies (14.10). If all the y’s are +1 the left-hand side of (14.10) may be 
rewritten as 


bi41(aty1) + biso(atys) ates + eos + b,(at) ats: ret ae . 


If (14.11) holds, this is 


S dissl(aiys) + abal(atye) + ++ + at ++ anal) | 
= binll — atpatye +: ar] < bin S 0. 

Similarly, if (14.12) holds, the left-hand side of (14.10) is 
S digtlatss + (at) aie +++ + (ai +++ anu) a] 
= Gigill — (ats --- at)'] < a. 


If the assumptions of Theorem 3 are not satisfied it is sometimes possible to 
prove a slightly weaker result. We shall illustrate this with the simplest case of 
Example (ii) of Sec. 12. Here one is concerned with four means £;; (i, 7 = 1, 2) 
given by 

f= Atyet», f= A-ygp-y 
f& = —A +e », aA = — ® 
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The first hypothesis tested is Ho:v = 0. In case of acceptance this is followed by 
Hi: S 0, while in case of rejection one is interested in the two hypotheses 
H2: 1 S & which is equivalent to \ + » S 0 or Hs: S £2 which is equivalent 
to \ — v S 0. The feature which complicates this problem is that when H, is 
true the three remaining hypotheses must either all be true or all be false. Thus 
the condition corresponding to (ii) of Theorem 3 is not satisfied. 

Let us denote the decision of rejecting or accepting H; by d; and d;’ as before, 
and consider the class & of procedures satisfying 


bo 

ao + bo 

_ bh 
ay, + b; 


P(d) < »=0, Pd’) s—“— if »¥0 


a + bp 


P(d,| do") = iff \+7r<0, 


> 1 1 aids . 
(14.13) P(d;" | do") S if X+v>0 


b; 


. < 
P(d; | do) ath 


if H; is true, 
P(d;'|do) < —“‘— if Hyisfalse (i = 2,3), 
a; + b; 


We shall then prove that any element of © , which is a subclass of the class of 
unbiased procedures, is minimax, under the additional assumption that 


a;=a and 0b; = b for all z. 


For any procedure 6, consider the error probabilities a0 = P(do), a: = P(d: | do’) 
a; = P(d; | do) (i = 2, 3), evaluated for’ = v = 0 and some fixed values yo and 
ao Of uw and oc. Then the possible value of lim R;(@) as @ = (A, uw, », o) tends to 
A = (0, Mo; 0, a0) are 


ai(1 — ao)ar + aldo + Aza, + azaxs] for 6 € wowiwows; 

bi(1 — ao)(1 — a1) + acolao + b2(1 — a) + b3(1 — a)] 
for 0 € wows wots 
axo)[bo + yay] + axolagar, + asers] for 0 & wo wwe 
a) [bo + ayaa] + aolaza2 + b3(1 — as)] for 0 € wo www 
ao) [bo + aia] + albe(1 — a2) + asas) for 0 € wo wre ws 

(14.14) ao) [bo + aya] + aolbe(1 — ae) + b3(1 — as)] 

for 0 € wo wwe wa 


ao)[bo + di(1 — a3)} + ao{a2a2 + 303] for 0¢ Wp Wi WW, 
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(1 — ao)[bo + bi(1 — ax)] + aolazae + b3(1 — a5)] 
for 0 € wow; woes 
(1 — ao)[bo + bi(l — a)] + aolbe(1 — a2) + agers] 
for 0 € wo wi We Ws 
(1 — ao)[bo + bi(1 — a1)] + arolbe(1 — a2) + b3(1 — a)] 
for @ € wo wi We Ws. 


With u = a, v = (1 — aplhay, Ww = apar, 2 = aya; and a; = a,b; = b these 
quantities become 


aqu+v +w +2] 
(a + b)u — bv — bw — bz + b 
—bu + av + aw+az+b 
av + aw — bz + b 
av — bw + az+b 
bu + av — bw — bz + b 
—2bu — bv + aw + az + 2b 
—bu — bv + aw — bz + 2b 
—bu — bv — bw + az + 2b 
—2bu — bv — bw — bz + 2b. 


From the first form of these 10 quantities it is seen that they all become equal 
for a; = a: = b; / (a; + b,). Let the corresponding values of (u, v, w, z) be (uo, 
Vo , We, 20). We shall now show that if we change from (tp , vo , wo , 20) to some 
other point (uw: , v: , wi, 2) at least one of the 10 quantities will be increased so 
that (we, vo, Wo, 20.) and hence a; = a; minimizes their maximum. If in (14.15) 
all possible sign combinations were occurring, this would be obvious. For then 
in at least one row all of the four increments +(u; — uo), (v1 — v0), + (wi — wo), 
+(z; — 2) would be positive. But of the 16 possible combinations only 12 occur 
(with rows 4 and 5 each counting for two combinations, the missing combina- 
tions being — + - -, + -—- +4+,4¢+-4+-,7+ - — +. 

As an example let us consider the first possibility. Suppose that uw, = wo — &, 
vi = 09 + 7, Wi = Wo — A, a = % — F, with &, n, A, ¢ all nonnegative. The 
total change in the first and tenth rows of (14.15) will be 


(14.15) 


a(—E+n-—A-—f) and bE —7+A+4 OQ). 


Since both of these are to be S 0, we have 


2z+A+rsensttAats 
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and hence § = 0. But then the change in the sixth row will be positive unless also 
n = A= ¢ = 0. The other three possibilities can be ruled out in a similar manner 
so that a; = at minimizes the maximum of the 10 quantities in question. If r* 
denotes the common value aj[ao + aja; + asa + a;a3] of these quantities, it 
only remains to show that (14.13) implies R,(@) S r* for all 6. This is clearly the 
case since for each 7 the coefficient of a; in (14.14) is = 0 in the lines correspond- 
ing to 6 ew; and S 0 otherwise. 

The result that the natural combination of a number of best unbiased or 
minimax tests leads to a minimax procedure for the compound problem, does 
unfortunately not hold even in the simplest cases in which the different hypothe- 
ses concern the same parameter. Consider for example problem (i) of section 3, 
in which the simultaneous consideration of H,:0 = @) and H2:6 < @ leads to the 
choice between the three decisions d2:@ < 4, do:@ = @ and d,:6 > 6. If the 
losses of false rejection and acceptance are a and b for both hypotheses the risk 
function is given by 


(b Po(do) + (a + b)Po(d:) for 6 < % 
Rs(6) = { alPa(ds) + Pr(d:)] for @ = 6 
\bPs(do) + (a + b)Po(d2) for 6 > 6. 


If a; = Ps,(d;) we have 


(14.16) supR;(@) = max{aa, — baz + b, afar + a2), daz — ba; + J}. 


For a given value of a; + a2, the maximum of the first and third terms on the 
right-hand side is minimized by equating them which gives a; = a, and reduces 
the right-hand side of (14.16) to 


max|(a — b)a + b, 2aa]. 


In the usual case that a > b, this is minimized for a = 0, that is, for a. = 1, 
a, = a = 0. In the standard case that the family P, is homogeneous this implies 
P,(do) = 1 for all @, the risk function of which is 


(b if 0 % 
R(@) = 4 : 
\0 if 0 = A 


which is then clearly minimax. 
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INVARIANCE, MINIMAX SEQUENTIAL ESTIMATION, AND 
CONTINUOUS TIME PROCESSES 


By J. Kirerer! 


Cornell University 


1. Introduction and summary. The main purpose of this paper is to prove, by 
the method of invariance, that in certain sequential decision problems (discrete 
and continuous time) there exists a minimax procedure 5* among the class of all 
sequential decision functions such that 5* observes the process for a constant 
length of time. In the course of proving these results a general invariance theorem 
will be proved (Sec. 3) under conditions which are easy to verify in many im- 
portant examples (Sec. 2). A brief history of the invariance theory will be re- 
counted in the next paragraph. The theorem of Sec. 3 is to be viewed only as a 
generalization of one due to Peisakoff [1]; the more general setting here (see 
Sec. 2; the assumptions of [1] are discussed under Condition 2b) is convenient 
for many applications, and some of the conditions of Sec. 2 (and the proofs that 
they imply the assumptions) are new; but the method of proof used in Sec. 3 is 
only a slight modification of that of [1]. The form of this extension of [1] in Secs. 
2 and 3, and the results of Secs. 4 and 5, are new as far as the author knows. 

In 1939 Pitman [2] suggested on intuitive grounds the use of best invariant 
procedures in certain problems of estimation and testing hypotheses concerning 
scale and location parameters. In the same year Wald [3] had the idea that the 
theorem of Sec. 3 should be valid for certain nonsequential problems of esti- 
mating a location parameter; unfortunately, as Peisakoff points out, there seems 
to be a lacuna in Wald’s proof. During the war Hunt and Stein [4] proved the 
theorem for certain problems in testing hypotheses in their famous unpublished 
paper whose results have been described by Lehmann in [5a], [5b]. Peisakoff’s 
previously cited work [1] of 1950 contains a comprehensive and fairly general 
development of the theory and includes many topics such as questions of ad- 
missibility and consideration of vector-valued risk functions which will not be 
considered in the present paper (the latter could be included by using the devise 
of taking linear combinations of the components of the risk vector). Girshick 
and Savage [6] at about the same time gave a proof of the theorem for the loca- 
tion parameter case with squared error or bounded loss function. In their book 
{7], Blackwell and Girshick in the discrete case prove the theorem for location 
(or scale) parameters. The referee has called the author’s attention to a paper by 
H. Kudo in the Nat. Sci. Report of the Ochanomizu University (1955), in which 
certain nonsequential invariant estimation problems are treated by extending 
the method of [7]. All of the results mentioned above are nonsequential. Peisakoff 
[1] mentions that sequential analysis can be considered in his development, 


Received July 12, 1956; revised March 15, 1957. 
1 Research sponsored by the Office of Naval Research. 


573 





574 J. KIEFER 


but (see Sec. 4) his considerations would not yield the results of the present paper. 

A word should be said about the possible methods of proof. (The notation used 
here is that of Sec. 2 but will be familiar to readers of decision theory.) The 
method of Hunt and Stein, extended to problems other than testing hypotheses, 
is to consider for any decision function 6 a sequence of decision functions {6;} 


defined by 


8,(z, d) = 8.(gx, 9A) u(dg) /u(G,) 
Gn 


where uz is left Haar measure on a group G of transformations leaving the problem 
invariant and {G,} is a sequence of subsets of G of finite u-measure and such 
that G, — G in some suitable sense. If G were compact, we could take u(G) = 1 
and let G; = G; it would then be clear that 6, is invariant and that suprr;,(F) < 
suprr;(F), yielding the conclusion of the theorem of Sec. 3. If G is not com- 
pact, an invariant procedure 4) which is the limit in some sense of the sequence 
{5;} must be obtained (this includes proving that, in Lehmann’s terminology, suit- 
able conditions imply that any almost invariant procedure is equivalent to an in- 
variant one) and super;,(F’) S super;(F) must be proved. Peisakoff’s method differs 
somewhat from this, in that for each 6 one considered a family {5,} of procedures 
obtained in a natural way from 6, and shows that an average over G, of the supre- 
mum risks of the 6, does not exceed that of 6 as n — © ; there is an obvious rela- 
tionship between the two methods. Similarly, in [7] the average of r;(gFo) for 
g in G, and some F) is compared with that of an optimum invariant procedure 
(the latter can thus be seen to be Bayes in the wide sense); the method of [6] 
is in part similar. In some problems it is convenient (see Example iii and Remark 
7 in Sec. 2) to apply the method of Hunt and Stein to a compact group as indi- 
cated above in conjunction with the use of Peisakoff’s method for a group which 
is not compact. The possibility of having an unbounded weight function does 
not arise in the Hunt-Stein work. Peisakoff handles it by two methods, only one 
of which is used in the present paper, namely, to truncate the loss function. The 
other method (which also uses a different assumption from Assumption 5) is to 
truncate the region of integration in obtaining the risk function. Peisakoff gives 
several conditions (usually of symmetry or convexity) which imply Assumption 4 
of Sec. 2 or the corresponding assumption for his second method of proof in the 
cases treated by him, but does not include Condition 4b or 4c of Sec. 2. Blackwell 
and Girshick use Condition 4b for a location parameter in the discrete case with 
W continuous and not depending on z, using a method of proof wherein it is the 
region of integration rather than the loss function which is truncated. (The proof 
in [6] is similar, using also the special form of W there.) It is Condition 4c which 
is pertinent for many common weight functions used in estimating a scale param- 
eter, e.g., any positive power of relative error in the problem of estimating the 
standard deviation of a normal df. 

The overlap of the results of Secs. 4 and 5 of the present paper with previous 
publications will now be described. There are now three known methods for 
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proving the minimax character of decision functions. Wolfowitz [8] used the 
Bayes method for a great variety of weight functions for the case of sequential 
estimation of a normal distribution with unknown mean (see also [9]). Hodges 
and Lehmann [10] used their Cramér-Rao inequality method for a particular 
weight function in the case of the normal distribution with unknown mean and 
gamma distribution with unknown scale (as well as in some other cases not 
pertinent here) to obtain a slightly weaker minimax result (see the discussion in 
Sec. 6.1 of [12]) than that obtainable by the Bayes method. The Bayes method 
was used in the sequential case by Kiefer [11] in the case of a rectangular dis- 
tribution with unknown scale or exponential distribution with unknown location, 
for a particular weight function. This method was used by Dvoretzky, Kiefer 
and Wolfowitz in [12] for discrete and continuous time sequential problems in- 
volving the Wiener, gamma, Poisson, and negative binomial processes, for par- 
ticular classes of weight functions. The disadvantage of using the Cramér-Rao 
method is in the limitation of its applicability in weight function and in regularity 
conditions which must be satisfied, as well as in the weaker result it yields. 
The Bayes method has the disadvantage that, when a least favorable a priori 
distribution does not exist, computations become unpleasant in proving the 
existence (if there is one) of a constant-time minimax procedure unless an ap- 
propriate sequence of a priori distributions can be chosen in such a way that the 
a posteriori expected loss at each stage does not depend on the observations 
(this is also true in problems where we are restricted to a fixed experimentation 
time or size, but it is less of a complication there); thus, the weight functions 
considered in [12] for the gamma distribution were only those relative to which 
such sequences could be easily guessed, while the proof in [11] is made messy by 
the author’s inability to guess such a sequence, and even in [8] the computations 
become more involved in the case where an unsymmetric weight function is 
treated. (If, e.g., § is isomorphic to G, the sequence of a priori distributions ob- 
tained by truncating yu to G, in the previous paragraph would often be convenient 
for proving the minimax character by the Bayes method if it were not for the 
complication just noted.) The third method, that of invariance, has the obvious 
shortcoming of yielding little unless the group G is large enough and/or there 
exists a simple sequence of sufficient statistics; however, when it applies to the 
extent that it does in the examples of Secs. 4 and 5, it reduces the minimax prob- 
lem to a trivial problem of minimization. 

Several other sequential problems treated in Section 4 seem never to have 
been treated previously by any method or for any weight function; some of these 
involve both an unknown scale and unknown location parameter. A multivariate 
example is also treated in Sec. 4. In example xv of Sec. 4 will be found some 
remarks which indicate when the method used there can or cannot be applied 
successfully. 

In Sec. 5, in addition to treating continuous time sequential problems in q 
manner similar to that of Sec. 4, we consider another type of problem where the 
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group G acts on the time parameter of the process rather than on the values of 
the sample function. 


2. Assumptions, conditions, examples, and counterexamples. We use the set-up 
and notation of a fixed sample-size decision problem (the inclusion of the se- 
quential case will be described in Secs. 4 and 5). A random variable X takes on 
values in ¥, which we may think of as being the underlying sample space with 
Borel field B; . The family § (possible states of nature) is a class of probability 
measures on (%, By). We write Pr{ } and Ey{ } to mean “probability of” and 
“expected value of” when X has probability measure F. The decision spade D 
has a Borel field Bp associated with it. The weight function W we take to be 
extended real (possibly +) and nonnegative (this could be generalized) on 
= X X X D, jointly measurable in its last two arguments. D is the class of deci- 
sion functions 6 from ¥ X Bp into the unit interval which are available to the 
statistician (not necessarily all possible 6); each such 6 is measurable in its first 
argument and a probability measure in its second one. For fixed F and 6, a prob- 
ability measure my,; on X X D is defined by its values on rectangles being given 
by mrs(Q X R) = Ev{xe(X)i(X, R)} where x¢ is the characteristic function 
of Q. The risk function of 6 is given by ri(F) = JW(F, x, s)mrs(dx, ds). We 
define 7; = supregrs(F). 

Let G be a group of transformations on § *X X X D which operates component- 
wise; i.e., each g ¢ G can be written g = (9: , ge, gs) where g: , ge, gs are trans- 
formations on §, X, D, respectively, and where g(F, x, d) = (g:F, gov, gd) for all 
F, x, d. For simplicity of notation we shall write gf’, gx, gd in place of giz, ger, gsd; 
this will never be ambiguous. G will be a group (not necessarily the largest) which 
leaves the problem invariant; i.e., for each g ¢ G, the probability measure of 
gX is gF when that of X is F, and W(gF, gx, gd) = W(F, z, d) for all F, z, d. 
Of course, it is necessary to impose some measurability restrictions on G: the 
elements of G should be measurable transformations on ¥ X D (thus, gX is a 
andom variable); moreover, we assume G to be a measurable group; i.e., there 
s a a-ring S (closed under differencing and countable intersection but not neces- 
rarily containing G) and a measure yu on (G, S) such that g eG, A ¢ S implies 
iA ¢ S and »w(gA) = u(A) and such that the-transformation t of G X G onto 
stself defined by t(g, h) = (g, gh) is S X S measurable. The reader is referred to 
g13] for a detailed discussion. We mention here the fundamental existence and 
uniqueness theorem, which states that every locally compact Hausdorff group 
has such a yu (left Haar measure) on (G, S) where S = Borel sets of G, such that 
iu is finite on compacta, positive on non-empty open sets, unique to within multi- 
[plicative constant, and regular. We also impose on W a measurability restriction 
which will make such integrals as 


| / W’(F, x, gr)6(g- ‘x, dr) u(dg) F (dz) 
“i “H /D 


meaningful in Sec. 3, where u(H) < « and we define W’ = max(W, b) for each 
positive number b. We also define r3 to be the risk function of 6 when W’” is the 





MINIMAX SEQUENTIAL ESTIMATION 577 


weight function, and 7; = sup,er;(F). We note that assumptions of measurability 
and invariance are unaltered when W is replaced by W’. (It is worth noting that 
any nondecreasing sequence of measurable invariant functions W* for which 
w* < band lim... W* = W could be used in place of the W’ throughout this 
paper. Thus, in some sequential problems where W is a sum of experimental 
cost and loss due to incorrect decisions, it may be more convenient to use a W*° 
reflecting separate truncation of these two components than to use W° which 
truncates their sum.) 

A decision function 6 is said to be invariant if 6(gz, gA) = 4(z, A) for all g ¢ G, 
z eX, de Bp. We denote the class of all invariant decision functions in D by D; . 

Let = UsJs where 8 ranges over some index set and the Js are equivalence 
classes of } under the equivalence F; ~ F; if F; = gF, for some g « G. Similarly, 
let ¥ = U.K, where the K, are equivalence classes under 2; ~ 22 if x; = gx2 for 
some g ¢ G. The number of elements in each Js (or K.) need not be the same, nor 
need there be the same number of Js as K, , etc. We hereafter denote by Fs a 
fixed member of J¢ . 

RemaRK 1. If 6 ¢ D,, clearly ré is constant on each Jz. 

We now list our five assumptions and examples of conditions which imply 
them. 

AssuMPTION 1. For each 6 in D there is a function y; from X into G such that, 
writing ys(xz) = gz and g;'x = x* (weshall hereafter not display the allowed dependence 
on 5), we havex* = Z* ¢ K, if x,% ¢ Ka , and such that for eachg in G the function 
6, defined by 


(2.1) 5,(z, A) = 8(gx*, ggz A) 


is in D. (We shall sometimes write zx, for the constant value of z*, z ¢ K, .) 

It may help the reader to see what 6, looks like in a simple example. Suppose 
X% = D = G = R' (additive group of reals), so that there is one K, and we take 
tq = Oand g.u = x + u. If 6 is a nonrandomized estimator, which we may think 
of as being a function ¢ from ¥ into D, the corresponding 6, (g a real number) is 
the function t, defined by ¢,(z) = x + t(g) — g. 

REMARK 2. The measurability portion of Assumption 1 is usually trivial. One 
must take care to ascertain that D is large enough to satisfy the remainder of 
the assumption. For example, if D were taken to be tests of some specified size 
+ (or S y) in a problem of testing hypotheses, 5, might have size <y (or > y) 
and would not be in D. This situation is easily handled as noted in Condition 
2a below. Counterexample B at the end of this section considers another case 
where Assumption 1 may be violated. 

AssuMPTION 2. For every 5 in D, h in G, d in D, and z, 


(2.2) Gishd = gz‘ d. 
Remark 3. Since hr e K, if xe Ka, (2.1) and (2.2) imply 


6,(hx, hA) = 5(gx*, ggrshA) = (gx*, gg-'A) = 8,(z, A), 
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so that 5, e D, for every g. We thus also note, putting g = identity in (2.1) and 
g = gz in the definition of invariant decision function, that a necessary and 
sufficient condition for 5 ¢ D, is that 6 ¢D and 4(z, A) = 4(x*, gz'A). 

ConDlITION 2a. (Testing hypotheses.) Let w be a non-empty proper subset of 
% and suppose G leaves both w and also § — w invariant. Let D consist of two 
elements d,, d:, and suppose W(F, z, d:) = cif FP ew, W(F, 2, da) = Lif 
F ¢ § — w, and W = 0 otherwise. If G is such that gX has probability measure 
gF when X has F, then G leaves the problem invariant, where G acts trivially’ 
on D (i.e., gd; = d;). Hence, Assumption 2 is automatically satisfied. Let D be 
the class of all tests. It is easy to see that as we let c vary from 0 to © the class 
of minimax procedures (assuming they exist) for the above problem will yield 
procedures which maximize the minimum power on § — w among all tests of 
size y (or S y) for0 < y < 1. An analogous result holds for problems of testing 
with general invariant W. In particular, the problem of finding a most stringent 
test of size y falls within our framework (see e.g., [4], [5] for discussion). (Our 
use of the term ‘‘size a”’ does not entail similarity.) 

The above condition can obviously be generalized to include k-decision prob- 
lems where § = > > w,; and G leaves each w; invariant. (The problem might be 
to find a procedure which maximizes the minimum probability of making a 
correct decision. In some examples such as ranking problems, G may also permute 
the Wj a 

Conp1TI0n 2b. For each a, K. is a homogeneous space G/M, , M, being the 
subgroup of G which leaves zx, fixed (see, e.g., [14]), where M, acts trivially on 
D. (A particular important instance of this condition, hereafter denoted Assump- 
tion 2b’, is that where ¥ = Y X Z, Y being a homogeneous space G/M where M 
is the subgroup of G leaving some element 2 of ¥ fixed, M acts trivially on D, 
and G acts trivially on Z. In this case we can write gx = g(y, z) = (gy, z), and we 
can identify the index @ with values z ¢ Z since G is transitive on Y and trivial 
on Z. Some examples where this condition is satisfied will be considered at the 
and of this section.) To see that Condition 2b implies that (2.2) is satisfied, we 
note that x ¢ K, implies that g = gizh takes z into z., so that gg. leaves xq fixed 
and is thus some element m, of M, . Hence, qd = gz'm.d = gz'd, whichis (2.2). 

REMARK 4. Peisakoff assumes, in the notation of Condition 2b’, that Y is 
isomorphic to G and that § consists of the possible probability measures of gX 
for g ¢e G when X has a given probability measure Fy (thus, we may think of G 
as being the “parameter space,’’ too). This special case of Condition 2b’ we here- 
after refer to as Condition 2bp (see also Example iv below.) Note that in Condi- 
tion 2b(2b’), M.(M) need not be normal in G, so K.(Y) need not be a subgroup 
of G. Of course, G might be either “larger” or “‘smaller” than , which will be 
partly reflected by the J,. 

Remark 5. It is convenient at this point to discuss the question of whether 
or not it is necessary to consider, as we have, randomized decision functions. 


? Throughout this paper we shall say that G acts trivially on D or a factor of D if the 
appropriate component of every g in G is the identity transformation. 
P y 
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We discuss this without consideration of questions of atomicity, our interest 
here being in the relationship of G and § to randomization. Suppose, for ex- 
ample, that the following condition were satisfied: 

Conpition NR. G is transitive on §. 

Let & be defined by 2 = a when X ¢ K,. Define X* by X* = 2* if X = z. 
It will usually be a trivial measurability verification to see that & and X* are 
random variables. If Assumptions 1 and 2 and Condition NR are satisfied, 
6 e D; implies (see Remark 1) that r; is constant and (F> being any fixed member 
of §) equal to 


ri(Fo) = Br.En | W(Fo, X, 8)8(X, ds) | ap 
D 


<j % En{ [ W(Fo, X, gx 8)8(X*, ds) | a} 


> Ey, inf Ey,{W(Fo, X, x8) | &}, 


where the invariance of 6 has been used in passing from the second expression to 
the third. Thus, whether or not the infimum in the last expression is attained, 
there clearly exists a function s* of a into D such that, if 6* is the nonrandomized 
decision function defined by 5*(z, g, , s*(a)) = 1 when z ¢ K, (we can think of 
6* as a function from ¥ into D which takes on the value g.s*(a) when z ¢ K,), 
then r;-(Fo) S rs(Fo), provided only that there are no measurability difficulties 
in defining the function s*. We shall not go into the last provision, remarking only 
that mild semi-continuity restrictions on W would suffice and that one could 
even avoid any measurability considerations by defining risk as an outer integral 
for “‘nonmeasurable decision functions.” In order to show that, for minimax con- 
siderations, one can do as well with the nonrandomized members of D; as with 
all of D; , it remains to show that 6* ¢ D, ; this follows at once upon noting that 
5* satisfies the condition given in the last sentence of Remark 3. 

In [1] and [7], the authors restrict their consideration to nonrandomized deci- 
sion functions; we note that Condition NR is satisfied in [1] and [7]. In general, 
one can not dispense with randomization, as can be seen from many examples 
where G is not transitive on §. For example, in estimating the mean @ of a bi- 
nomial distribution (0 < @ < 1) with W(6, z,d) = |@— d/|* with0 <p <1, 
the only minimax procedures are randomized (see [10]); G consists of two ele- 
ments here. In many discrete problems of testing hypotheses randomization will 
also be necessary. 

We note that a 6* formed from an s* which achieves the infimum (w.p.1 under 
F,) above is obviously a uniformly minimum risk decision function among mem- 
bers of D, : thus, if Condition NR is satisfied in addition to Assumptions 1 to 5, 
this gives a prescription for explicitly writing down a minimax procedure. A 
similar remark applies to e-minimax procedures if the infimum is not attained. 

AssuMPTION 3. For each b there is a subset T, of § with T, D {Fg}, and a family 





580 J. KIEFER 


S, of probability measures on T, which includes each measure giving probability 
one to a single element of T, , such that 


(2.3) inf. sup r3(¢) = sup inf r3(é), 


beDy, Ec Sp EeS, beDy, 


where r3(£) is the expected value of r; with respect to the probability measure on §. 

Remark 6. Whenever {F's} is finite (e.g., if G is transitive on w and § — w 
or on each w; in the case of Condition 2a or is transitive on § in 2b; see also 
Example vi below), if also D (or merely D;) is convex, (2.3) is trivial. In many 
other'cases it may suffice to let S, be a family of totally atomic (discrete) meas- 
ures, so that no measurability difficulty arises in defining r3(£) (see, e.g., [15], 
{16]). If one tries to verify (2.3) using an S, containing more general measures 
with respect to some Borel field on §, one must also make sure that conditions 
implying the existence of the integral r3(£) are satisfied for these £. 

It is not clear how essential Assumption 3 is to the validity of the theorem 
of Sec. 3; it will be seen there that it is used because (3.3) can not in general be 
verified if integration with respect to ~ is replaced by a supremum over I, there 
(Counterexamples A to D at the end of this section show that none of our other 
four assumptions can be entirely dispensed with). The reason for not necessarily 
putting T, = {fs} is that (2.3) may sometimes be more obvious for a larger 
I; than for {Fs}. 

AssuMPTION 4. We assume that 


(2.4) lim inf #% = inf 7%. 
boo 8eD, beDy 

(By monotone convergence, the right side of (2.4) is equal to the left with the 
operations of limit and infimum interchanged.) 

ConniTIon 4a. If W is bounded, (2.4) is trivial. This condition will usually be 
satisfied in problems of testing hypotheses and interval estimation. 

ConniTion 4b. The following set of conditions is not the most general possible 
of this type, but covers many important cases such as the examples of this section 
and Sec. 4 for many commonly employed unbounded W. We assume that G is a 
topological group satisfying Condition 2bp, that D = G, and (writing § as G) 
that W(g, (y, z), d) does not depend on y and may hence be written W(g™‘d, z). 
We also assume for each z the existence of an increasing sequence {U*} of com- 
pact sets whose limit is G and such that every compact subset of G is in some 
U; , that W(h, z) is bounded in h in each U} and tends to © uniformly in h for h 
g U; asr — ©, and such that, for each ro and r, , the set U?,(G@ — U%,) (group 
multiplication) is disjoint from U?, for all sufficiently largen. We also assume regu- 
larity conditions on W of the type mentioned in the discussion of Condition NR 
and that there exist (as there will if (X, Bz) is Euclidean with the Borel sets or a 
countable product of such spaces) conditional probability measures on Y given the 
Z-coordinate of X. Let Fx denote the probability measure of X, and let Fy denote 
the probability measure of the Z-coérdinate of X, when the element g of § is the 
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identity; and let Fo(A | z) denote a version of the conditional probability measure 
on Y, evaluated at the set A C Y, given that the Z-coédrdinate of X is z. Our final 
assumption of Condition 4b is that the compact and sequentially compact subsets 
of G coincide (this is clearly removable if the next phrase is appropriately re- 
stated) and that, for each go e G, gi — go implies lim inf W(yg,, z) = W(ygo, 2) 
w.p.l under Fo. 

The above condition is not as complicated as it may first seem: for example, 
if G is the additive group R” and W is for each z bounded on bounded sets and 
co at ©, we can take U? to be the sphere of radius r centered at the origin. 

We now verify that Condition 4b implies Assumption 4. If Condition 4b is 
satisfied, so is Condition NR, and we can restrict ourselves to nonrandomized 
members of D, in computing either side of (2.4). According to Remark 3 and the 
discussion of Condition 2b’, these are functions from % onto D of the form t'(y, z)= 
yt(z) where ¢ is an arbitrary measurable function from Z into D. We hereafter 
label nonrandomized members of D, by ¢ in place of 5. Since r;(F) = rs(Fo) for 
6 ¢D,, we have 


f= [| w'tte),2)Fo(ay|2)Fo (de), 


the same equation holding with no superscript b. Thus, if we show that 


lim inf | W°(yt,z)Fo(dy|z) = int W (yt, z) Fo (dy | z) 
Y t Y 


bow te Y 


for each fixed z, (2.4) will follow from monotone convergence. Thus, we may 
neglect a set of Fo-measure 0 and delete the z, and it will then suffice to prove that, 
if W is a nonnegative function on G, satisfying the conditions assumed above for 
each z in a set of Fo-measure one, and Q is a fixed probability measure on Y = G, 
and if « > 0 and 


Osq< int [ W(yt)Q (dy), 


then there is a B such that b > B implies 
[ W’(yt)Q (dy) > q(1 — ©) for all teG. 
Y 


We hereafter denote integration with respect to Q (over y) by Eg . First let Us 
be a compact subset of Y with Q(Uo) > 1 — «, and let Vs = {y | W(y) S b}. 
Thus, the closure of V, is compact. By our assumption, there is a compact set 
U,, of {U,} such that y e Uo andt¢ U,, imply yt z V, . Hence, tz U,, and b > q 
imply that EgW*(yt) > q(1 — 6), and it remains to show that ¢t e U,, andb > B’ 
for some B’ imply the same result. Let ft ¢ U,, be chosen so that EgW°(yt) < 
b + inf. v,EoW°(yt) and let {t,, i = 1, 2,--- } be a subsequence of {t} 
with limit ¢’ (say). Then, for each r, since U,U,, (group multiplication) is compact, 
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lim inf EW*(yt) = lim EQW*(ys) = lim inf [| W*(yt,,)Q (dy) 
b+x Ur 


bow teUry i+ 


=limint | WiysQ(a) =f Wiwt)Q (dy). 
ico Ur U; 

Letting r — ©, the last member must tend to a value 2 g, completing the proof 
that Condition 4b implies (2.4). 

ConpiTiIon 4c. For brevity we state this for the case where §, ¥, D are as in 
Condition 4b with G = additive group R’, but it is easily generalized to versions 
for other groups. Writing the group operation as addition, we again assume W 
to be of the form W(d — g, z), but now assume for each z that W(y,z) < L, < @ 
if y S e, , that W(y, z) > c, as y— — ~, and that, for y 2 e. , W(y,z) is finite 
and nondecreasing and —« as y — ©. We also assume as before that, for each 
real ¢, 4; > timplies lim inf W(y + 4; ,z) 2 W(y + t, z) w.p. 1 under F, . Finally, 
we assume that there exists at least one member do of D; for which r;, < @. 

As in the consideration of Condition 4b, by neglecting a set of Fo-measure 
zero, we can reduce the problem to proving that inf, EgW°(y + t)—> 
inf; EgW(y + t) where W(y) ~ casy ~ —«, W(y) < L <@ fory S e, W(y) 
is nondecreasing for y = e and —« asy —«,t;—>t implies lim inf W(t; + y) 2 
W(t + y) w.p.1 under Q, and inf; EgW(y + t) = q <@. Clearly, for some B, 
and T;,b > B,andt > T, imply EgW°(y + t) > q. Also, letting t be any value 
for which EgW(y + to) <@, since W(y + t) < L + W(y + t) fort < t and 
all y, we obtain EgW(y + t) cas t— —© by bounded convergence. Thus, 
q = c. Obviously, for e > 0,b > Landt < T,(e) imply EWi(y+t>ec- 
¢€= q — e. To summarize then, it remains to prove that 


lim,.,,infr, <: <1; EW'(y + t) = q 


where 7’; and T, are finite. This case is treated in the same way the case ¢ ¢ U,, 
was treated in Condition 4b. Thus, Condition 4c implies (2.4). 

The form of our next assumption is Peisakoff’s; he calls it ‘weak boundedness.” 
As usual, we denote (A — B) U (B — A) by AAB. 

AssumpTION 5. There exists a sequence {G,} of measurable subsets of G with 
0 < u(G,) < © and such that, for each g in G, 


(2.5) lima. u(GGnAGn)/u(G,) = 0. 


ConpiTio0n 5a. If G is compact Hausdorff, we can take G, = G and Assumption 
5 is satisfied. 

ConpiTIon 5b. Peisakoff [1] also gives the following examples of groups satis- 
fying Assumption 5: 

(1) G = additive group of R” (take G, to be the cube of side n, centered at 0). 

(2) G = real affine group; here an element of G is a pair (b, c) with b positive 
and c real and (b, c) (b’, c’) = (bb’, be’ + c), and du = dbdc/b’; in [1], (2.5) is 
verified directly if G, is taken to be the set where | c/b| < e” ande” Sb S e”. 
A less computational verification can be obtained using Condition 5d below. 
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Peisakoff attempts to show that the full linear group GL(n) also satisfies (2.5), 
but his proof seems to be incorrect (see also Counterexample D cited below). 

ConpiT10n 5c. G satisfies Assumption 5 if it is the direct product of two groups 
satisfying Assumption 5. We omit the obvious proof. 

Condition 5c can be used, for example, if G is a direct product of real affine 
groups. Another example (see Example iv below) is that where G is the direct 
product of the multiplicative group of positive numbers (scale group) and the 
orthogonal group O(n) on R”. In connection with this last example, note that the 
two factors which generate the group, considered as subgroups of G, of course 
commute; it is instructive to contrast this or the proof of Condition 5d below 
with the difficulty one encounters if one tries to verify Assumption 5 for GL(n) 
by representing an element of the group as (for example) Q:P or Q,DQ, where 
Q: , Q2 are orthogonal, D is diagonal, and P is positive definite. We next prove 
that G satisfies (2.5) if a slight strengthening of (2.5) is satisfied for a normal 
subgroup and factor group of G. This can be used in examples such as that of 
Condition 5b(2), Example vi, etc. 

Conpition 5d. Suppose a locally compact G has a closed normal subgroup 
G” with factor group G® = G/G™; that for i = 1, 2 there is an increasing 
sequence {QS} of sets whose union is G and such that Q2” has compact closure 
and any compact subset of G” is in some Q2’; that there is a sequence {GS 
of measurable subsets of G such that G2’ has compact closure; and that m > n 
and g” ¢ QS imply nu? (g?G2 n G2) > (1 — eu? (GS’) for some sequence 
{en} with lim, «, = 0, where uw” is a left Haar measure on G. Under these 
conditions we shall show that G satisfies Assumption 5. Let v(m) > m be such 
that 77g ¢ QS) if g? © QS and +r eG (the set of all such 7 “gr is con- 
tained in a compact set). We shall show that G,, = GOGS2.) satisfies Assumption 
5. For let ¢ > 0 and let g = gg” be an arbitrary element of G. Choose n so 
that g «QS and (1 — «)(1 — em) = 1 — «. Since (see Sec. 63 of [13] and 
the references cited there) u(E) = f u?([77E]n G™)y(dr), and since 
(rg) (79 Ge Ey 1 EO = 171g 1G.) if 77g? GS contains the identity 
and = the empty set otherwise (where r ¢e G™), we have, for m > n, 


ulgGn 0 Ga) = wg 9? GoGse, 2 GOGre» 


@,_—1 UW) (1) (1) (2) 
io ay ana ab (r g TGs (m) n Gy (m)) (dr) 
g 6G,’ NG, 


(1 — €n)(1 — em) ue (GSQ) uw (GE) = (1 — OulG,), 


proving our assertion. (It is easy to extend Condition 5d to more factors.) 
EXaMPLEs. We list briefly a few examples (of estimation except for Example 
vi) to illustrate some of the concepts of this section. In each case W will be 
assumed to satisfy appropriate conditions which will be obvious, and the possible 
choices of D will be evident if not stated. 
(i) (Location parameter) ¥ = R” and, « denoting the n-vector (1, --- , 1), 
X = (X,,---, X,) has c.d.f. Fo(x — 6e) for some 6 ¢ R' (identified with §), 
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the form of Fy being known. Here Condition 2bp is satisfied with Y = R’ = space 
of X, and Z = R"” = space of X. — X,,---,X, — Xi. 

(i’) (Secale parameter) Let R*” be the subset of R” where no codrdinate is 
zero. For simplicity we assume R” — R*" has probability zero according to every 
element of §, so that we can take X = R*". Here § is identified with the positive 
reals and X = (X;, --- , X,) hase.df. Fo(x/@) for some @ > 0. Letting Xj = log 
|X|, t = sgn X;,¢ = (4, ---,¢t,) and & = log @, this problem can be trans- 
formed to that considered in Example 1 with the trivial and inessential modifi- 
cation that the sample space is R” X T where 7’, the space of 2” possible values 
of t, is acted on trivially by G. (The case where R" — R** has positive probability 
is handled similarly by considering ¥ to be the union of subspaces ¥; (0 Si Sn), 
where Xi = --- = X; = O and Xu; ¥ O in X;. A similar remark applies in 
other examples.) 

(ii) (Seale and location parameters). Let R**” be the subset of R” where no 
two coérdinates are equal and n = 2 (see also Example v). All elements of § will 
give probability one to R**", which we take to be ¥. § will be identified with 
G = real affine group, and X = (X;,---, X,) has df. Fo((x — 6€)/62) for 
some @, > 0 and real @, . Condition 2bp is satisfied if we take Y to be the space 
of (X,, |X; — X:2|) and Z to be the space of sgn (X; — Xz) and (X, — X;,) / 
|X, = X2|, 3 S t S %. 

In the above examples, if fy and W have additional symmetry properties, a 
larger group might leave the problem invariant. Our next two examples illustrate 
this possibility. 

(iii) Consider the setup of Example i with D = R’, Fo(x) symmetric about 0, 
and W a symmetric function of @ — d satisfying Assumption 4. As in Example i, 
the group G”’ = additive group of reals leaves the problem invariant; but so 
does the larger group G* = direct product of G” and G® where G™ consists of 
the identity element and an element which takes z, @, and d into their negatives. 
We cannot apply Condition 2b here with G = G* since G® does not act trivially 
on D. However, wecan apply Condition 2b (or even 2bp) with G = G"’ and then 
make a trivial application of the Hunt and Stein method in order to assert that, 
if 6*(z, A) = [6(x, A) + 6(—z, —A)], then 7+ S 7; ; thus, we can conclude that 
the conclusion of the theorem of Section 3 holds with G = G*. Note that we 
cannot conclude that there will be a G*-invariant minimax (or e-minimax) non- 
randomized procedure, since Assumption 2 is violated for G*; indeed, without 
some monotonicity restriction on the density of Fy and on W (which would yield 
this result) this conclusion is false, as can be seen from consideration of the 
weight function W = 0 if 2 < |@ — d| < 3 and W = 1 otherwise when F)(z) 
is normal with mean 0 and variance 1. 

The advantage of obtaining the conclusion of the theorem of Section 3 for 
G = G* instead of merely G = G™ in examples of the above variety is, of course, 
that there are fewer G*-invariant procedures than G“’-invariant procedures 
among which we must search for a minimax procedure. Although we would 
therefore usually like to take G as large as possible, the above example illustrates 
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that the apparent reduction obtained in using G* in place of a smaller G“’ may 
in some cases only be illusory, since we may lose the reduction to nonrandomized 
procedures in passing from G to G*. However, the example might suggest that 
the method of Hunt and Stein, used ab initio, would result in a simpler treatment. 
Counter-example C below shows, though, that the use of that method also could 
not avoid the verification of something like Assunr#A 2 for the non-compact 
factor of G. In the following remark we summarize the general result obtained 
by using the Hunt and Stein method as in Example iii. 

Remark 7. If G* is the direct product of G” and G® where G® is compact 
Hausdorff and where the conclusion of the theorem of Sec. 3 is valid for G = G™, 
then that conclusion is valid for G = G*. 

(iv) ¥ is R** = R"*-origin of R", while § is the set of c.d.f.’s Fo(x / @) for 
6 > O where under @ = 1 the X; are independent and normal with mean 0 and 
variance 1. D is the set of positive reals and, e.g., W is a function of d/@. We 
can take G to be the group cited as the second example under Condition 5c. 
Here ¥ = G / O(n — 1) (we can think of O(n — 1) as leaving the point (1, 0, 
-++ , 0) fixed) and O(n — 1) (in fact, O(n)) acts trivially on D. Thus, Condi- 
tion 2b’ is satisfied. Since Condition NR is also satisfied, our search for a 
minimax procedure is reduced to considering nonrandomized estimators of the 
form c >, X? where the constant c is chosen to minimize the risk. 

Note that Condition 2bp cannot be satisfied for the G used in the above 
example. If we had treated the example as a case of Example i’ so as to use 
Condition 2bp, we would have ended up searching through a much larger class 
of procedures unless we invoke some further principle such as that of sufficiency 
(in a manner similar to that of Sec. 4). We remark that Peisakoff indicated 
another method which could be used in some examples such as this one when 
one wants to use Condition 2bp: Let Q be a random variable independent of 
X and uniformly distributed on the component O*(n) of the identity of O(n), 
and apply Condition 2bp to the G considered above on the sample space 
R*”" X O*(n) of X’ = (X, Q). The disadvantage of using this technique, where 
it is possible to do so, is that in some examples further considerations may be 
required to reduce the class of invariant procedures to that which would have 
been obtained if Condition 2b’ had been used directly. Note that the technique 
used here is really related to that of Remark 7, which would give the desired 
result more directly here, but which would still be inferior to the direct use of 
Condition 2b’ which does not require the technique of Remark 7 in the present 
example. 

(v) ¥ and G are as in Example iii, but with n = 1. D = R’, the object being 
to estimate 6, . The weight function is, e.g., a function of (d — 6,) / 6, which we 
hereafter take to be the argument of W. There is one K, , and if we try to verify 
Condition 2b’ we run into trouble. For example, take x, = 0 so that M is the 
multiplicative group of reals (not normal in G) and X = G/M; M does not act 
trivially on D, so Condition 2b’ is not satisfied. If we consider this example as 
a case of Example i (i.e., let G be the smaller group used there), we obtain 
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for D, the class C of procedures 6 for which 6(z, A + x) = 46(0, A). If the con- 
clusion of the theorem of Sec. 3 were valid for G = affine group, we could restrict 
ourselves to those members of C for which 6(z, A) = 6(az, aA) for all a > 0; 
putting z = 0, this means 6(0, A) = 4(0, aA) for all a > 0; taking A to be the 
interval (—1, 1), this means 4(0, 0) = 1; noting the equation defining C, this 
means that there is only one invariant procedure 6* under the affine group, the 
nonrandomized estimator t(z) = x. One would like to conclude that this estimator 
is minimax. If lim inf;.. W(t + X) = W(X) w.p.1 when (6,, 6) = (0, 1), an 
application of Fatou’s lemma to the equation 


13(6;, 02) = / / Ww aa * \rs(au)a(, dr) ifieC 


yields the fact that lim info... rs(@:, 62) = rse(= constant) for 6«C, and the 
conclusion that 6* is minimax is justified. However, Counterexample C below 
shows that without some such additional assumption as the one made here on 
W, this conclusion is false: 5* need not be minimax and we can only conclude 
that there is a 5 e C which is minimax (or e-minimax). 

(vi) The univariate general linear hypothesis (GLH) is discussed in detail in 
many places. If y is the parameter on which the power function of the usual 
F-test of specified size « depends, it is easily proved (see, e.g., [5a]) that this test 
is uniformly most powerful invariant of size « of the GLH y = 0 (against y > 0). 
There are several ways to apply the theorem of the next section to conclude, 
e.g., that this test is most stringent of size « (first proved by Hunt and Stein). 
One is to consider for fixed y) > 0 the problem of testing y = 0 against y = yo, 
to note that G is transitive on w and on § -- w in this case so that Assumption 
3 is satisfied (as are the other assumptions), and thus to conclude that the above 
test is most stringent of size e; since this is true for every yo > 0, it follows that 
the test is most stringent for the original GLH. Another method (better than the 
above in other problems where such a property uniform in yo may not hold) is to 
verify Assumption 3 directly for GLH; we can do this easily by applying the 
theory of [15] to the present case. Alternatively, (2.3) can be verified by consider- 
ing, on the right side of (2.3), a — assigning probability one to the set consisting 
of one point in w and one point at which the power function of the F-test differs 
most from the envelope power function. 

CouUNTEREXAMPLES. We now list briefly four counterexamples to the conclusion 
of the theorem of Sec. 3, only the third of which is new, in order to indicate that 
Assumptions 1, 2, 4, and 5 cannot be entirely dispensed with. 

(A) In [6] and also in [7] are given examples which show that the conclusion 
of the theorem of Sec. 3 is false if (in terms of the present treatment) Assumption 
4 is violated. We note here also that if, in the notation of p. 313 of [7], the weight 
function is altered to f(s) = 1 if s is an integer and f(s) = max (s, 0) otherwise, 
then there exist invariant procedures with finite risk (=1), but the conclusion 
of the theorem is still false; thus, we see that if in Condition 4c the condition that 
Wy, z) > c, as y — — © and — monotonically as y —> © were dropped while 
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maintaining the condition that a 4) ¢D, with finite risk exists, Assumption 4 
would not be implied. 

(B) As Peisakoff has pointed out, the invariance theory applies to the general 
sequential case only if we restrict D to consist of procedures which take at least 
a first observation with probability one. In Section 4 we shall discuss this in 
more detail (there are cases where this restriction of D is not necessary); for the 
moment, we give an example to demonstrate that the conclusion of the theorem 
of the next section would not generally be true without such a restriction. Sup- 
pose we are limited to taking a single observation or else no observation on a 
random variable whose distribution depends only on a location parameter 6 
which we desire to estimate (see Example (i)), the loss from estimating @ incor- 
rectly being bounded by 1 and the cost of experimentation being 2 or 0 de- 
pending on whether or not we take an observation. Any minimax procedure in 
®D must clearly take no observation with probability 24 (a similar remark 
applying for e-minimax procedures); however, the only invariant procedures 
take a first observation with probability one (see Sec. 4 for further discussion). 
The difficulty here is that Assumption 1 is violated, since g, must depend on 
the observation and thus, for a 6 which requires no observations, the 4, of (2.1) 
would require no observations but would depend on the observation, and would 
thus not be a legitimate decision function. 

(C) As an example which shows that Assumption 2 cannot be entirely dis- 
pensed with, consider the setup of Example v with Fo(z) = 0 if z < Oand =1 
if z = 0, and let W = 1 if d = @ and =0 otherwise. This is essentially a game 
where one player says “don’t you name the real number I name” and then 
names a real number, while the only affine-invariant procedure for the other 
player is, on hearing the number, to name the same number. The procedure 
6* of Example v is in fact uniformly worst and is clearly not minimax, while 
there exist many minimax procedures in the class C. This example can be made 
into one where all members of § have densities with respect to a fixed o-finite 
measure by restricting ¥, D, and G to the rationals (of course, this changes y, 
and Condition 5b(2) is no longer applicable), and can be made more proba- 
bilistic by letting Fy assign probability 4} to each of the values —1, 0, 1; but the 
phenomenon persists. See Example v for an example of a condition which elimi- 
nates the phenomenon encountered in Counterexaniple C. 

(D) Stein [17] has announced an example in testing hypotheses where all our 
assumptions except Assumption 5 are satisfied and where the conclusion of the 
theorem is false. This example shows that the real projective group and GL (2) 
do not satisfy Assumption 5. 


3. Proof of invariance theorem. We now use a modification of the method of 
proof used in [1] under Condition 2bp and Assumptions 4 and 5, in order to 
prove the following theorem (see also Remark 7 of Sec. 2): 

Tuerorem. If G leaves the problem invariant and if Assumptions 1 to 5 are satis- 
fied, then for any 6 e€ D and « > O there is a & € D, such that 7y Se + %.In 
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particular, if 5* is minimax among procedures in D, , then it is minimax among 
procedures in D. 

Proor. Our first step is to prove (3.5) below. Denote right invariant measure 
on G by » ;ie., « (EZ) = u(E™’). Fix b and 6 ¢D and let {G,} be a sequence 
satisfying Assumption 5, and define 


(3.1) Hyg) = | W'C, x, gr 6(", dr). 


Then, for y ¢ G, 


tim | [| Hel) — Healy Iu *(do)/u(Ga) 


n~o 


(3.2) lim || [Hy,2(h) — Hr,2(yh)|u(dh)/u(G,) 


noo |G, 


< lim 2bu(yG, AG,)/u(G,) = 0, 


n--o 


by Assumption 5. Using (3.2) with y = g. and bounded convergence, we obtain, 
for any fixed §¢ S,, 
lim f[ f ait ee 
(33) M2] ear) [ raz) | Hr.) — Hralgeg i (da)/u(Ga) = 0. 
“46 n 


It will simplify notation if we define the operation L by 
(G4) L=liminf J (ar) | Flas) |v *aadlu(G0I |. 
Using (3.3), a change of variables, and (2.1), we obtain 
LW’ (F, z, g'r)d(gz, dr) = LW°(F, 2, g.g7'r)8([9.9 ‘| '2, dr) 
(3.5) = LW’(F, z, u)d(gx*, dugg: 'u) 
= LW’(F, 2, u)d,(z, du). 


Let 6¢D. Using the fact (Assumption 3) that S, includes every measure 
giving probability one to a single element of T, > {Fs} and that gX has proba- 
bility measure gF when X has measure F’, we have for any fixed 6 e D, 


7; = sup sup [ F(az) [ W’(F, x, r)6(x, dr) 
Fes 0b z D 


sup sup sup [ §(dF) | Faz) / W° (GF, gx, r)6(gx, dr) 


b EeS8, geG 


= sup sup lim inf | u (dg) [u(G,))* 


b EeS, nw e 


-| ¢(dF) [ riaz) | W’ (oF, gx, r)5(gx, dr), 
T, “2% D 
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where the inequality follows from the fact that an average is no greater than a 
supremum. Using Fubini’s theorem (u~'(dg) on G,’ and &(dF)F(dz) on T, X ¥ 
are both finite) and the invariance of W (i.e., W’(gF, gz, r) = W’(F, x, g'r)), 
we see that the last member of (3.6) is equal to the supremum with respect to 
b and é of the first member of (3.5). On the other hand, again using Fubini’s 
theorem, the supremum with respect to b and é of the last member of (3.5) is 
equal to 


(3.7) sup sup lim inf u*(dg)rs,(€)/u(Gn) > sup sup inf r4(é), 

b §Ee8np now Gz! 6b EeS8, beDyz 
the inequality following from the fact that an average is no less than an infimum 
and that 6, ¢ D, for g eG (see Remark 3). Using first Assumption 3 and the 
fact that sup;. s,73(£) = # if 6 ¢ D,, and then using Assumption 4, we see that 
the right side of (3.7) is equal to 
(3.8) sup inf 7;= inf #%. 


b beDy; beDy 


Thus, for each 6 ¢ D, the first member of (3.6) is no less than the last member 
of (3.8), proving the theorem. 

The above theorem does not, of course, treat the question of whether or not 
a minimax procedure exists, i.e., whether inf;,a7; is attained. Conditions for this 
may be found, e.g., in [15] and[16]; the same conditions will usually apply for 
both D and D,; , so that the conclusion of our theorem can be strengthened by 


the additional remark that a minimax procedure exists in D, if one exists 
in D. Various conditions for the attainment of inf; 7; are also given in [1] and 
[4] (see [5a]). Of course, for suitably simple W one can often write down an ex- 
plicit formula for a minimax invariant procedure in the manner discussed under 
Condition NR of Sec. 2; for example, by now this formula is well known in the 
case studied in [6]. 

It is of interest to note an observation of Peisakoff to the effect that his proof 
(under Condition 2bp) will go through in many cases where the elements of § 
are not all the distributions gF> for g ¢ G, but only a suitably large subset of 
these: e.g., in Example i of Sec. 2, the restriction @ = 0 might be imposed. This 
extension can also be carried out under our assumptions in certain cases where 
the restricted class of elements g for which gf, ¢ § is not compact. 


4. The sequential case. Our setup in this section is that of Secs. 2 and 3 with 
certain interpretations. For simplicity our description is specialized to handle 
the examples stated at the end of this section, although a more general setup is 
obvious. The space % is a product space ¥; X %. X --- with denumerably many 
factors or a trivial modification of such a space as in Example ii or iv of Sec. 2, 
and we write a point of ¥ as zr = (x, 22, -°-- ) and the random variable X as 
(X,, X2,--°- ). In the examples we treat, the X,; will be copies of the same 
Euclidean space and the X; will be independent and identically distributed 
according to each F e §. The group G will act componentwise on %, so we may 
write gz = (gx; , g%2, --* ). The space D will be a product space D, X E where 
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the “terminal decision space” D, has the role the space D had in fixed sample- 
size problems and Bp = Bp, X Bg where (Bp, , D,) is the Borel sets on a subset 
of a Euclidean space and By contains at least the countable subsets of E. The 
“experimental decision space” E consists of all ordered k-tuples of (not neces- 
sarily distinct) positive integers for k = 0, 1, 2, --- , as well as infinite sequences 
of positive integers; we represent an element of FE by e, = (a, a2, -** , Ge), 
such a k-tuple representing an experiment carried out in k stages, the ith of 
which consisted of a; ‘observations,’ namely, on X,,_,41, °°: , Xe, , Where we 
write s(e) for the sum of the integers in e and s, = s((a,, --~ , @x)); @o represents 
the taking of no observations, and we write e,, = (a, a2, --- ) for an e where 
experimentation never ceases. The group G acts trivially on F, so that we may 
write gd = g(di , e) = (gd, , e) in the sequel. The weight function W can depend 
on F, d,; , and e; for simplicity of exposition, in this section the weight func- 
tion W will be a sum of two non-negative parts: 


(4.1) W(F, x, (di, e)) = Wi(F, di) + W3(e), 


although the more general form W(F, d; , e) can be treated in similar fashion. 
Thus, W, takes the place of the W of the fixed sample-size case and must satisfy 
the invariance condition W,(gF, gdi:) = W1.(F, d,) for all F, d, and g. The cost 
of experimentation W.2(e,) we assume to be non-negative and finite if k < © 
and infinite ifk = © (the cases where W,(e,) is permitted to be infinite fork < 
in some treatments of decision theory to reflect upper limits on sampling will be 
covered by restricting D as indicated in Remark 8 below), and we assume the 
existence of a finite number q and a real nondecreasing function h tending to 
infinity with its non-negative argument and such that, fer allk < © and e, 


W2((a a *** 5 @bs5 1)) = W2((a, res a)) < qd 
(4.2) W,(e) > h(s(e)), 
W2((a1, -++ , Gey Qey1)) 2 We((a1, --+ , ae)); 


in other words, the cost of taking one additional observation at any stage is 
bounded, for any finite number M only finitely many different e’s cost less than 
M, and additional observations always have non-negative cost. One often imposes 
on W; practical restrictions such as W2((a; + a2)) S W((a, a2)), but this is 
inessential for our considerations. Typical specializations of W, often encountered 
in practice are W2((a,, --- , a@x)) = >-1 W2((a,)) or = W2((>-ia,)) the latter 
case with W,((t)) = ct being especially important. 

Denote by B,, the Borel field of members of B; which are cylinder sets with 
base in ¥; X --- X X, ; i.e., a B,-measurable real function of z is one which 
depends on z only through (2; , ---, z,), the only Bo-measurable functions being 
constants. We denote by D° the class of all sequential decision functions 4, i.e., 
functions 6 on ¥ X Bp which are probability measures on D for each zx (see also 
the discussion of the paragraph containing (4.3) below for interpretation) where, 
in addition to the measurability requirements of Section 2, each 6 ¢ D® is assumed 
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to satisfy the restriction that if e = (a, --- , a) with s(e) = rand if Q,,. is 
the set of all elements e,, or e, of E of the form e, = (a,;,--- , a ,4,--+) or 
@; = (1, °+* , Qe, @, Oey2, «+ , ay) forall 7 = k + 1 and all a,42, ---, then 
5(z, A; X e) (for each A; ¢ Bp,) and 6(z, D, X Q.,.) are B,-measurable in z; that 
is, the decision to stop taking observations or to take a particular number of 
observations at the next stage depends only on observations which have already 
been taken. Let D‘ denote the class of all 6 in D° for which 4(z, D; X e) = 0 
whenever s(¢) < 7; i.e., which for each z observe at least x; , --- x; w.p.1. For 
i = 0, let D; denote the invariant procedures in D’; of course, 4 is invariant if 
5(gx, gd, X e) = &(x, A; X e) for all g, z, A; , e. We have already seen in Counter- 
example B of Sec. 2 that the theorem of Sec. 3 will not generally be true if D = D° 
because not all of the 5, of Assumption 1 will be decision functions. Of course, if 
G were compact we could use the method of [4] directly as outlined in Sec. 1, 
without any difficulty. For the examples treated at the end of this section it 
will suffice to take D = D' or D’. (The sequential considerations of [1] consist 
of briefly pointing out an example of the sequential setup of D and the necessity 
of not taking D = D°.) 

The question arises, how much do we lose by restricting D to be D' or D* 
rather than D°? The answer will usually be easy to verify. For example, suppose 
D, , G, §, and the X; are as in Example i (or i’) of Sec. 2 (Examples vii to x of 
the present section) and that W,; , which we may think of as a function of 6 — d, , 
tends to its supremum w (say) when its argument tends to © (or, similarly, 
—«). Then any procedure 6 which requires 0 observations w.p.1 clearly has 
7, = w. Since any member of D° can be written as a probability mixture of a 
procedure in D' and a procedure which requires 0 observations w.p.1, it is evi- 
dent that either every procedure requiring 0 observations w.p.1 is minimax, 
or else there is a 6 ¢ D' which is minimax. Which of these is the case will be 
easy to verify in most practical examples. In particular, if w = «, the second 
is always the case. 

The function 6 as given above is (with a different notation) the function p de- 
fined in Eq. (1.3) of [15]; 6(z, 4: X e) is the probability, when 6 is used and 
X = z, that the experiment will terminate with experimental decision e and 
terminal decision an element of the subset A, of D, . The usual representation of 
a sequential decision function is obtained by letting D be the union of D, with 
the space L of positive integers and writing, for each element e of E and subset 
A of D, 


(43) 5(B| 2, e) = 2% Ar X ¢) + dl, Ds X Q) 


6(z, Dy, x Q.) ; 


where Q, is the set of all elements of the form e, = (a:,--:,@:,°*: ) ore = 
(a;,°°> ,@,--*°,@;) of E for allj = k, whene = (a,, --- , a) (thus, Q, isthe 
union of all Q,,. for a > 0, together with e), while Q) is the union over a e An L 
of the sets Q,,. , and we let A, = An D,. If the denominator of the right side of 
(4.3) is 0, define 6(A | z, e) = 1 or 0 according to whether or not 1 « An L; the 
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definition in this case is only for definiteness and could be made in many other 
ways. The left side of (4.3) represents the conditional probability, when 4 is used 
and given that X = z and that the experiment has already proceeded 
(if e = (a, --- , a)) through k stages of experimentation as represented by e, 
that a terminal decision in A; is made or that the next stage of the experiment 
consists of a number of observations in A, = An L. Clearly, 5(A | x, e) is B,-meas- 
urable in x if s(e) = r, and the functions 6(A | z, e) on Bp X X X E satisfying 
obvious restrictions are in 1-to-1 correspondence with the functions 6(z, A) on 
% X Bp as described originally (By consists of every union of a set in Bp, and a 
set in Bz). Moreover, in terms of our later description, 6 is invariant if 6(A | z, e) = 
5(gA | gx, e), where gA = g(A,U A,) = (gA;) U A, . We shall use this representa- 
tion of D} below. 

The problems we are going to consider are ones in which the difficulty en- 
countered in Counterexample B can be avoided as indicated above, and in which 
there is a very simple sufficient sequence { 7';} of functions on %, 7; being B,;-meas- 
urable (the range space of 7’; is immaterial). If one does not employ the principle 
of sufficiency in the manner of this section the theorem of Sec. 3 will only yield 
the dependence of the stopping rule on r.(=(z_2 — 2%) ,°°*, 2n — 2%) after n 
observations in Example vii, for example), nothing like the result we obtain. 
Specifically, we assume (see Example xv for further remarks) 

AssumPTION 6. For some positive integer m, Assumptions 1 and 2 are satisfied 
for D = D” with gz a By-measurable function of x. There exists a sequence {T;} 
of functions with T; a B,-measurable sufficient statistic for [((X,,--- , X:), §], 
such that there exist conditional probability df.’s 


Fy(y., +++ Yr |) = P{ge'(Xi, ---,X-) S (um, -++ , ye) | TAX) = t} 
for r = m with the property 
(4.4) Fi(y., +++ , Yr | t) does not depend on t, . 


It will aid understanding to consider an example at this point, Example vii 
of this section. The X; are normal with unknown mean and known variance, 
and ¥; = G = D, = R’. We also identify § with R’ in obvious fashion. We 
can let n = 1 and g.w = u + 2 for u in X; or D, , and identify the indices a 
with sequences z2 = gz t = (0, %2 — %1,% —%,°::). Let Ts; = Dj X;. 
Since gx'X1 = 0 and gz'(X2, --- , X,) = (X2 — Xi, ---, X- — Xi), the dis- 
tribution of gx'(Xi, --- , X;) given that 7,(X) = t; is multivariate normal with 
means and covariances independent of t;, so that Assumption 6 is satisfied. 
Similarly, in Example xi with G the affine group and ¥; = R’, we put 
gz ti = (2; — 11)/(z2 — 14), ete. 

Assumption 6 is related to a property cited in [5a] as being proved in [4] in 
certain regular cases, to the effect that we lose nothing in the validity of the 
theorem of Sec. 3 for problems considered in [4] if we first use the principle of 
sufficiency and then apply the invariance principle to the space of a correctly 
chosen sufficient statistic. Assumption 6 also includes an additional strong 
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property in that (4.4) is obviously not implied by this result of [4] (see also 
Example xv below). This assumption is easily verified in Examples vii to xiv. 

Denote by Q(s) the infimum of 7; — W:(e{"”) over all 6 with 4(z, D; X e{”) = 1, 
where e{"’ = (s), and by Q,(s) the infimum when 3 is also restricted to be in- 
variant; thus, Q(s) and Q,(s) are the values of inf; 7; over all 6 or all invariant 
5 for the fixed sample-size problem with sample size s when the weight function 
is W, . We assume 

AssuMPTION 7. Either 7; = © forall 6 ¢ D” or else there is an integer m’ with 
Q(m’) < @. 

This assumption is easy to verify in practical cases for the examples considered 
in this section, where one will usually know Q,(j) < © for some j. The assump- 
tion can be shown, in fact, to be implied by our other assumptions under mild 
regularity conditions, although for the sake of brevity we forego such con- 
siderations here. 

The main remaining difficulty in applying the theorem of Sec. 2 to the present 
problem is the verification of Assumption 4, which would usually be difficult to 
verify directly in sequential problems. Our form of the theorem which follows 
reduces this verification to the much simpler nonsequential one of Sec. 2. 

Tuerorem. If G leaves the problem invariant and Assumptions 3, NR, 5, 6, 7, 
(4.1), and (4.2), as well as Assumption 4 for W, in each fixed sample-size problem 
with sample size = m, are satisfied, and if D = D”, then for each « > O there 
exists a fixed sample-size invariant procedure 6* (the sample perhaps being taken 
according to some grouping) with 7s. Se + infsem 7; . Thus, if Qr(s(e)) + We(e) 
is minimized over s(e) = m by e = e’ and if 5* is a minimax invariant procedure 
for the fixed sample-size problem with sample size s(e’) ignoring W; , then a minimax 
procedure for the sequential problem is to take s(e’) observations according to the 
grouping e’ (which minimizes W2(e) over e satisfying s(e) = s(e’)) and then to 
use 5*. 

Remark 8. Before proving the theorem we remark that the first paragraph 
of the proof below can easily be altered to handle the case where D is further 
restricted in some way such as bounding k or the a, or s(e,) in e = (a; , --- , Gd), 
etc. We have already noted the fact that it will usually be easy to verify whether 
a minimax procedure of D” or a more trivial procedure is minimax in D°. We 
also note that one can think of G as acting on 7, for r = m in the examples 
treated by us, so that the conclusion of the theorem could be phrased in terms 
of invariant functions of T, . 

Proor oF THEOREM. We may assume p = infxp 7; < ©, the theorem being 
trivial otherwise. By Assumption 7 there is an m’ and a procedure 3° with 
s(x, Dy X ei”) = 1 and fo — W.(e{™’) = C < o. Since the X; are inde- 
pendent and identically distributed we can clearly assume m’ = m. Let « be a 
positive number. The second line of (4.2) implies the existence of a number 
N’ > m such that any procedure 5 ¢ D” with # < p + ¢« must require fewer 
than N’ observations with probability >1 — « for all F ¢ §. For any such 6 
define the procedure 4’ as one which proceeds like 6 except that whenever ex- 
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perimentation has reached a stage e (including eo) where s(e) < N’ S s(e) + t 
for some ¢t with 4(z, ef" | e) > 0, 8’ assigns the probability (x, ei” | e) which 6 
assigned to the taking of ¢ observations at the next stage (there may of course 
be several such ¢) to the taking of exactly m’ additional observations one-by-one 
and, if these observations are taken, uses 8° on these last m’ observations to reach 
a terminal decision. Since the X; are independent and identically distributed, 
by the first and last lines of (4.2) we clearly have 7» < #; + «(C + qm’) and 
8’ e D”. Since ¢ > 0 is arbitrary, we conclude that our theorem will be proved 
if we prove it for the case where ® is restricted to the class D”” of procedures 
in D” for which 4(z, D; X Ey) = 1, where N is a fixed integer and E, is the 
set of e for which s(e) < k. We hereafter assume D = D””. 

In order to apply the theorem of Sec. 3 to the present case, it remains to verify 
Assumption 4 when D = D””. Let Qi(s) be the value of Q;(s) when W, is re- 
placed by Wi. By Assumption 4 in the fixed sample-size case, we have 


lims... [W2(e) + Q?(s(e))] = Wale) + Q,(s(e)) 


for each fixed e with s(e) = m. Since there are only finitely many e with 
m & s(e) < N, we obtain, for D = D””, 


lim inf % = lint min [W.(e) + Q3(s(e))] 


boo 5eDy, boo mssle) <n 


min [W.2(e) + Q,(s(e))] = jinf fs, 


mssle)<n 


which is Assumption 4 for the present problem. 
Applying, then, the theorem of Sec. 3, we obtain for any ¢ D"”” and « > Oan 
invariant procedure 4’ with 7» < 7; + «. Since &’ is invariant, we have 


8'(x, A | e) = 8'(gz'z, gz'A | e) 
8'(gz'z, gz A | e) + 8'(gz'z, An L | e). 
Define the procedure 6” by 


8”(z, A|e) = E{8'(gx'X, gx A; Le) | Tae) = Tue(2)} 
(4.5) 


+ / i'(y, An L| e)Fye(dyi, + * = 5 dyecey | Toce)(2)). 


Since Bp, = Borel sets on a Euclidean set, this defines a decision function for 
some version of the conditional expected value (see, e.g., [18]). Clearly 
8”(x, D, X Ex) = O fork = mand = 1 fork = N, so 8” eD”™. Since {7;} is 
sufficient, rx = rs. But for each e and A, C L, Assumption 6 implies that 
5”(z, Az | e) isa constant. Hence 6” can be considered to be a member of the class 
¢@ of probability mixtures of fixed sample-size procedures of sample-sizes s(e) 
with m S s(e) < N, where the sample may be taken according to some grouping 
e (independent of X). It is easy to see that, under our assumptions, the result 
of the previous paragraph remains true if D””” is replaced by ¢ and that Assump- 
tions 1, 2, 3, and 5 remain satisfied ; thus, the theorem of Sec. 3 is valid for D = ¢, 
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so that there is a 6* e¢, C Dr” with fs S 7 + € S 7s + Ze. This completes 
the proof of the theorem, since condition NR implies the constancy of r+ and 
hence the existence of a fixed sample-size 6* ¢ ¢, with r3# S rs. 

We note that 6” in the preceding paragraph can be proved invariant in our 
examples, for an appropriate version of the first term on the right in (4.5), but 
the proof as given seems just as short. The lack of dependence of W on z in 
(4.1) is of course used in invoking sufficiency. 

EXamPLes. We shall use the following notation in our examples, where z and 
6, are real and 6, and + are positive: 


e007 /085 


1 
filz; 1, 2) = ands ’ 
im if |z— | < 6 


folz; 1, 2) = ‘ 
0 otherwise, 


‘hk ds \" — 0)" 63T(y)  —ifz > 4 
a7v\*; 1, 92) = 


0 otherwise. 


In all the examples except xiv, the X; will be independent real random variables 
whose common Lebesgue density will be assumed to be in some class of the above 
densities, which class we identify with §. 

(vii) § consists of the densities f; for —«0 < 6, < © with 6, assumed known 
and 6; to be estimated and hence G = D, = § = additive group of R’, W; being 
a function of 6, — d,. Note that in most practical examples Assumption 4 can 
be verified by applying Condition 4a or 4b or 4c, and the question of whether 
to use a procedure requiring no observations or one in D' will be easy to settle. 
Of course, we can take T; = }-j.3 X;,m = 1, and g.u = u + 2, as previously 
mentioned. Thus, the conclusion of the theorem will be satisfied for most W, 
and W, encountered in practice. Of course, Q,(s) is easily computed in this case 
to be given by 


(46) Qx(s) = int [ ” Wilh + ufilts; 0, 6257) du; 


and, if h, achieves the minimum, a nonrandomized sequential minimax estimator 
will be given by taking s(e’) observations according to the grouping e’ described 
in the statement of the theorem and then estimating 6, by s(e’)*T ace) + have’) « 
(vii+) We mention several extensions of Example vii: (1) The form of the 
minimax (or an analogous e-minimax) estimator above depends on 6, in such a 
way that if it were only known that 6, belonged to some set B (not necessarily 
the set of all positive numbers) and if W, were a function of (d; — 6,)6:° instead 
of (d — 6,), then the estimator of the previous problem vii for the case @. = 1 
would be minimax (or e-minimax) here. (2) A second extension ‘s to note that, 
for the original setup of Example vii, if W: is symmetric we can also apply the 
group of reflections as in Example iii of Sec. 2. If in addition W; is nondecreasing 
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in |6,; — d,|, we obtain the sample mean (h, = 0 in vii) as minimax estimator, a 
result first obtained in [8], with a special case in [9]. Note that the question of 
whether a procedure in D' or one requiring no observations is minimax is trivial 
in this case. 

(viii) Same as vii, except that the possible distributions are the f;, with y = 1 
and @ known and @,; unknown with —~« < 6, < o. In this case 7; = min 
(X,, --- , X,) and the considerations and conclusions are as in vii with f; re- 
placed by fs(u; 0, @28°*) in (4.6), the minimax estimator being Ty) + he. A 
very special case of this was obtained tediously in [11]. 

(ix) § consists of the densities f. with 6, known and @, unknown, 0 < @, < ©. 
Here 6, is to be estimated, so D, = § = G® = multiplicative group of positive 
reals. The weight function is a function only of 6./d;. We can either take 
G = G"™ or can think of 6, is being 0 and let G = direct product of G® and 
G® where G” contains the identity and an element which multiplies X; by — 1 
and leaves ¥ and D fixed. We have 


m=1 and T; = max (|X; — 6), --- , |Xi — |) 


andg;'u = u/(x, — 6) if G = G™, withanobviousmodificationif weletG = Gg? x 
G®. In either case, Assumption 6 is satisfied. Of course, this problem is really the 
same as that of estimating 6. when X; has density 1/6, for 0 < 2; < 6, and 0 other- 
wise (put X; = |X; — 6,| above), and in this form the problem may be reduced 
to that of viii by a logarithmic transformation as in Example i’. The form of the 
analogue of (4.6) and of the minimax procedure are obvious. The special case 
W: = (02 — d;)°/62 was considered in [11]; Condition 4c is satisfied there. 


(x) § consists of the densities f;, where y and 6; are known and @, is unknown, 
0 < 6 <«. This is a scale parameter problem with G = the G™ of ix, and we 
need only remark that the theorem applies with T; = > -j.1X,, the analogues 
of (4.6) and the form of the minimax procedure being obvious. This problem 
was treated for a particular y and weight function in [10] and for a special class 
of weight functions in [12]. 

(x’) If X,; has symmetric density about known 6, , the density of | X; — 4, | 
being that of Example z, the same considerations apply, using also the G® of 
ix. Similarly, the problem of estimating 6, when f; is the density and 6; is known 
can obviously be reduced to that of Example x. 

The next three examples are similar in that, in each, there is both an unknown 
location parameter 6, and also an unknown scale parameter @. with —© < @,<© 
and 0 < @ <«.In each case m = 2, G is the real affine group (see Example ii), 
and gz'x; = (x; — 2:)/(a2 — 2). There are three main types of problems in each 
example: (1) estimation of both 6, and @ , so that D,; = G, d,; = (du , dw), Wiis 
a function of (@: — dy)/@: and di2/@., and 


gz di = (du — 21)/(22 — 11), dis/(t2 — 21)); 


(2) estimation of 6, , where D, = R, , W, isa function of (0, — d;)/62, gz d; = 
(d; — 2;)/(x2 — 21); (3) estimation of 6, , where D, = positive reals, W, is a func- 
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tion of d:/6_ ,gz' d; = d;/(2, — 2); of course, (2) and (3) can really be considered 
as special cases of (1) where W; only depends on one of its two arguments. For 
each type and example it is simple to write down an analogue to (4.6) and the 
corresponding form of a minimax procedure. In each case the conditions of the 
theorem are easily verified for many commonly used W, and the verification of 
whether one should use a procedure in D’ or one requiring one or no observations is 
also easy. The use of the Bayes method in these examples would of course be 
much more complicated than that in [8], [11], and [12]. 

(xi) § consists of all densities f, . Putting X° = 7* }-5.1 X;, we have 7; = 
(X®, S5.u(X,; — X¥)’) for i = 2. Note that the problem of estimating  , 
even for the appropriate weight function, cannot be obtained by the method of 
[10] without some modification, because of the nature of the Cramér-Rao bound. 

(xii) § consists of all densities f, . Putting U; = min (X,, --- , X;) and V; = 
max (X,,--- , X;), we can take T; = (U;, V,) or ((U; + V,)/2, (Vi — U3), 
for i = 2. (The second form of 7; here and in the next example are pertinent to 
remarks made below in Example xv.) 

(xiii) consists of all densities f3; (i.e., y is known to be 1), and in the notation 
of the previous two examples we can take 7; = (U,, X°) or (U;, X — Us) 
fori = 2. 

(xiv) Asan example of a multivariate nature, suppose X; = R” for some positive 
integer J, the X,; again being independent and identically distributed. Here 
X; = (Xa, --- , Xi), and we assume X; has a multivariate normal distribution 
with the identity convariance matrix and unknown mean @ = (6,, --- , 07) € R’. 
The problem is to estimate 6, so that D, = § = G = additive group of R’ and 
W, is a function of the difference between the vectors d; and 6; . Taking m = 1 
and T; = X“ and g;'u = u — x, for u ¢ R’, the theorem is applicable for many 
common weight functions. (Examples viii to xiii have similar multivariate 
analogues. ) 

(xiv+) We can extend Example xiv in the manner of vii+. In particular, 
if W, is an increasing function of the usual Euclidean distance between d; and @, 
it is easy to see that X°“’” is a minimax sequential estimator. The orthogonal 
group also leaves the problem invariant in this case, but this fact need not be 
used in obtaining the above form of the minimax estimator, it sufficing to apply 
a result of [19]. It is interesting to note that it is shown in [20] that, when W, is 
the squared length of the distance and J > 2, this estimator is not admissible. 

(xv) As an example which illustrates the fact that the method of this section 
yields little if no 7; satisfy Assumption 6, consider the problem of estimating 
6, when the X; have density f, and @, is known. This problem is considered for 
certain W in [15] and [21], and the minimax procedures obtained there are not 
fixed sample-size. As in Example xii, (U; , V;) is a minimal sufficient statistic. 
Assumption 6 cannot be satisfied for any sufficient 7; . The application of our 
method in this example would yield the form of the estimator obtained in [15] 
and [21], but would only yield the fact that the minimax stopping rule depends on 
U,; — V, at the ith stage; the stationary form of the minimax stopping rule seems 
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to depend strongly on the particular nature of f. . It will be noted that the pre- 
vious examples differ from this one in that in the former, but not in the latter, 
there is a natural version of T; for i = m whose range is G and such that the problem 
in terms of the T; is left invariant by the natural operation of G on the range of T; . 
This is the essence of the examples where the method of this section yields the 
conclusion of the theorem, although we have seen that G may be modified some- 
what from what this statement indicates (see Examples vii+, ix, x’, and xiv+) 
to the case where the range of T ; is a subgroup or homogeneous space of G. We may 
add that, in most sequential testing problems, the invariance principle yields 
little, for reasons similar to those present in Example xv. 

Remark 9. We end this section with a remark about other versions of the 
statistical problem, such as that of minimaxing the W, component of the risk 
subject to a bound on the W, component or vice versa. This includes such prob- 
lems as the problem of finding optimum sequential estimators of bounded relative 
error of the scale parameters in Examples ix to xiv (in [7] there is some discussion 
of this problem but our results are not obtained) and that of obtaining optimum 
sequential interval estimators of prescribed length and confidence coefficient 
for the location parameters in Examples vii and viii. The latter problem is con- 
sidered in [8] and [9] in the case of Example vii, while [8] considers also the prob- 
lem of minimaxing one component of risk subject to inequalities on two others, 
etc. The discussion of [8], [21], and [12] shows at once on application of our the- 
orem that results of all these types hold for appropriate fixed sample-size pro- 
cedures, or probability mixtures thereof, in Examples vii to xiv. 


5. Sequential problems with continuous time. In this section we will use the 
method developed in Secs. 3 and 4 to obtain certain sequential minimax results 
for decision problems concerned with stochastic processes with continuous time 
parameter. Two types of problems will be considered: in Part I of this section 
we treat problems where the invariance is present in the same form as in Sec. 4, 
while in Part II the invariance has to do with the time parameter. 

I. Extension of Section 4 to continuous time. The problems we consider here 
will be continuous time analogues of certain of the problems of Sec. 4 (in fact, 
those of Sec. 4 can be considered as special cases of those here, in the manner 
of [12]). Since the proofs are essentially identical to those of Sec. 4, we shall 
not give them. In fact, rather than to state a general theorem, we shall merely 
list three examples. In each of these the separable process { X(t), t > 0} is one of 
independent and stationary increments which can be taken to be continuous on 
the right, and X(T) is sufficient for {X(t),0 S ¢ S T}. Asin Sec. 4, W can bea 
function of 6’ d (@ being the unknown parameter) and of the experimentation 
decision, but for convenience of exposition we discuss the case where it is a sum 
W, + W.. The cost of experimentation W, may either be taken to be of the 
form W,(T) if the process is observed continuously up to time 7’, or else the 
cost may be allowed to depend on the number and spacing of the instants at 
which the process is observed; a description of this and other modifications 
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(such as the problem of having to give an estimate continuously), as well as a 
more detailed discussion of the nature of sequential decision functions in the 
case of continuous time, and of the processes considered, will be found in [12]. 
In all of the examples, assumptions on W, can be treated as in Sec. 4. The ana- 
logue here of the restriction to D' in Sec. 4 is that we must restrict ourselves to 
the union over all « > 0 of the classes D* of procedures which observe the process 
for at least 0 S ¢t S ¢ w.p.i for all F ¢ §. When we consider D*, the g, is a func- 
tion of X(e). As in Sec. 4, it will be easy in most practical cases to decide whether 
there will be a minimax procedure in D‘ for some « > 0 or a minimax procedure 
which does not observe the process at all. 

In each of the three examples, our result is, under assumptions on W like those 
of Sec. 4, that there exists an invariant minimax or «-minimaz procedure which 
observes the process for a constant length of time w.p.1 (or a minimax procedure 
which does not observe the process at all). Formulas for computing the minimax 
procedure can be given as in Sec. 4 or [12], and Remark 9 of Sec. 4 applies also 
to these examples. 

(xvi) The process is the one-dimensional Wiener process with known variance 
per unit time and with EX(t) = 6,t, the object being to estimate @, . Thus, G, 
§, D, and the form of W, are the same as in Example vii. In particular, in the 
special case of a symmetric monotone W; , we obtain the result of Sec. 5 of [12]. 

(xvi’) For the Wiener process with unknown scale or unknown location and 
scale, it has been shown in [12] that the scale parameter can be estimated with 
arbitrarily high accuracy in arbitrarily short time. Hence, the only new practical 
problems that arise when the scale parameter is unknown do so because W, 
reflects the number of instants at which the process is observed. In this case, 
as indicated in [12], we obtain problems analogous to Example xi with G the 
affine group, or to Example x’ (see also the next example below). In either of these 
problems there will be an invariant minimax procedure which observes the 
process at a certain set of instants specified in advance of the experiment. 

(xvii) The process is the Gamma process; i.e., X(0) = 0 and X(1) has density 
function f;, of Sec. 4 with 6; = 0 and y known, the object being to estimate the 
scale parameter 6, . Here §, D, G, and W; are the same as in Example x of Sec. 4. 

(xviii) Consider the J-variate Wiener process X(t) = (X,(t), --- , Xz(#)) 
where the X,(t) are independent with known scale factors and EX,(t) = 6,t, 
the 6; being unknown, —« < 6; <«. This is the continuous time analogue of 
Example xiv, and the considerations there and in xiv+ carry over to the present 
example. 

II. Invariance in time. We now consider a process { X(t), ¢ = 0} with unknown 
parameter @ > 0 and with the property that, if { X(t), ¢ > 0} has probability law 
labeled 6, then the process {X,(t), t 2 0}, defined by X.(¢) = X(ct) where c > 0, 
has probability law labeled cé. The most familiar process of this kind is the 
Poisson process. Another such process is the gamma process with 6 known and 
y unknown. 

Suppose the weight function (for estimating @) is a function only of d,/@ and 
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T/0, where d, is the terminal decision and T is the length of time experimentation 
is carried on (modifications of the type mentioned earlier in this section and 
discussed in [12] are also possible). Then clearly the multiplicative group of posi- 
tive reals leaves the problem invariant, where we define g({X(t)}, 6, (d; , T)) = 
({X(gt)}, 98, (gd, g °T)), the group operation being ordinary multiplication. 
The difference here from previous problems is that G acts on the process by 
shifting the time argument of a sample function by a scale factor rather than by 
operating on the values of the sample function, and that G acts nontrivially on 
the experimental decision. The reason for allowing this last action and the 
accompanying dependence of W on 7/@ rather than on T lies in the form of the 
result which this setup yields when one applies the invariance theorem and 
examines the invariant procedures. 

The details here are slightly more delicate and lengthy than those in Part I, 
so we shall be content with sketching the main idea. Consider the Poisson process 
with right continuous sample functions. X(r) is sufficient for {X(¢t),0 S ¢t S r}. 
Suppose we have a nonrandomized stopping function which depends on the 
sufficient statistic, i.e., a nonnegative functional 7' of the process with the prop- 
erty that the event t; < 7T S t, is measurable with respect to the Borel field 
generated by { X(t), 4: < ¢ S t}. For such a T to be invariant we must have 
T(x) = cT(x.) for all c > 0 and all sample functions z, where x, is the sample 
function of X, when z is the sample function of X. It is easy to see that such a 
stopping function as T(x) = constant is not invariant, while 7,(z) = first time 
t that x(t) = r, where r is a fixed positive integer, is. In the present problem we 
must restrict to decision functions which observe the process until at least 
the first time X(t) = 1 (that time gives g;'). Under fairly general conditions one 
can verify whether or not a minimax procedure should observe the process at all 
and that, if it does, a stopping rule of the type 7, is minimax. Of course, an 
invariant nonrandomized estimator will be of the form constant/T7’,. A special 
case of this result thus shows that the procedure suggested in Sec. 3 of [22] and 
which was asserted there to be minimax among all procedures using a particular 
stopping rule 7’, (analogous to a fixed sample-size problem) actually has an opti- 
mum property among all sequential procedures: e.g., among all procedures 


which give at least the prescribed accuracy of estimation, this one minimaxes 
E,T/@. 
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ON THE COMPARATIVE ANATOMY OF TRANSFORMATIONS! 


By Joun W. TuKEy 


Princeton University 


1. Summary. The attention of statisticians has usually been focussed on single 
transformations, rather than on families of transformations. With a growing 
appreciation of the advantages of examining the behavior of data or approx- 
imations over whole families of transformations (Moore and Tukey [2], Ans- 
combe and Tukey [1]), there arises a need for rationally planned charts for 
representing families of transformations. 

The contributions which (i) the topology of the family and (ii) a definition of 
the strength of a transformation can make to charting are studied in general and 
applied to the charting of the simple family of transformations. This family is 
defined to include all transformations of the form 


y is replaced by z = (y + c)? 


and all their limits. It thus includes z = log (y + c), z = e”” and the special case 


(0, Y = Ymin» 


\1, otherwise, 


where Ymin is the least value of y either (i) present in the data or (ii) possible, as 
well as all linear transformations of these transformations. 

Experience having shown that transformations with p < 1 are much more 
frequently useful than any others, the charts developed, presented, and ex- 
emplified here are restricted to the part of the simple family—its central region— 
for which p S 1. Separate charts are presented for two cases which should cover 
most cases which arise in practice: 

(1) Where, as with counted data and small counts, the least reasonable value 

for y + c = 0, and this value is likely to occur; 

(2) Where y + c is always safely > 0, and the range of y is through not many 

powers of 10. 


2. Introduction. In the statistics of today, transformations seem to have two 
sorts of uses: 

(1) providing approximations for theoretical purposes or general convenience; 

(2) aiding in the analysis of data by bending the data nearer the Procrustean 

bed of the assumptions underlying conventional analyses. 

Both are interesting and important. In both the quality of the work of the 
transformation is often judged by the numerical value of some suitable criterion, 
and transformations which make the value of this criterion sufficiently near some 
ideal value are acceptable. (Examples follow below.) If we are to consider any 
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member of a family of transformations as a potentially useful candidate, we 
should like to understand how one or more criteria vary over the whole family. 
If we could write a sufficiently simple expression for each criterion in terms of the 
parameters of the family, we could perhaps reach this understanding alge- 
braically. But this is almost always impossible. We usually have to evaluate 
each criterion numerically for each of a number of transformations, and then 
synthesize these few numerical values into an understanding as best we can. If 
we can find a suitable chart, we can undoubtedly make much more effective use 
of these few numerical values. Hence the desire for a satisfactory chart. 

As one example, note that intermediate statistical texts often prove that, as 
the number of degrees of freedom — «, the distribution of x” tends to normality. 
None, to my knowledge, goes on to remark that the same is true of any fixed 
rational function of x’, and hence, in particular, of the results of applying any of 
the transforms of the simple family. But, knowing this, we are not surprised to 
find Fisher using the 1/2th power, or Wilson and Hilferty using the 1/3rd power 
of x’ in seeking for a good normal approximation. Nor does the approximate 
normality of log x* surprise us very much. If, for a given number of degrees of 
freedom, we wish to select one transformation of x’ to approximate normality, 
our choice would be greatly eased by charts showing, for example, the deviations 
of selected percentage points of the approximations from the values appropriate 
to normality. 

Notice how disorganized and unrelated is our information on such approxima- 
tions. The approximate normalization by transformation of a Poisson distribu- 
tion of average value A can be quite well accomplished by two apparently un- 
related transformations: z = +/y and z = log (y + A). No connection between 
these two successful approximations seems to have been recognized, although our 
definition of the strength of a transformation now suggests a reasonably close 
connection. 

In analyzing data which does not match the assumptions of the conventional 
methods of analysis, we have two choices [1]. We may bend the data to fit the 
assumptions by making a transformation. Or we may develop new methods of 
analysis with assumptions which fit the data in its “original” form somewhat 
better. If we can find a satisfactory transformation, it will almost always be 
easier and simpler to use it rather than to develop new methods of analysis. To 
judge of its satisfactoriness, we need a criterion. The precise nature of the 
criterion will depend on the situation—and on what ills we are trying to remove 
by transformation: nonadditivity of effect, nonconstancy of variance, non- 
normality of distribution, or what have you. 

If we seek to remove nonadditivity of effect, for example, we may take as our 
criterion the ¢-value for removable nonadditivity. An example has already been 
discussed by Moore and Tukey [2]. With a few additional values, this example is 
used below to illustrate one of the suggested chart forms. In treating other ills, 
other criteria would of course be involved. 
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3. Strength and local structure of transformations. The strength of trans- 
formations is investigated in general in Sections 5 to 7 and applied to the simple 
family in Sections 8 to 10. The definition of strength to which we are led is much 
that the strength of 

z2=y 
is naturally taken as 1 — p, so that the sequence of transformations 
z2=y, 
z= Vy, 
2 
z= 


zZ 


is an equally spaced sequence of strength 0, 1/2, 1, 3/2, 2, --- 
If we compare an arbitrary transformation with these unmodified power 
transformations, we are led to define its power strength as 


we], - 
dy jy, dy Jue 


log (y2/y:) 


For z = y’ this definition yields 1 — p independent of y; and y, . In general the 
result will depend on both y; and y, , and may be thought of as the strength of 
the unmodified power transformation which resembles the given transformation 
aty = y.and y = ye. 

The philosophy underlying our exploration and heavy use of the local proper- 
ties of a family of transformations is discussed in Sections 11 and 12. The topolo- 
gies and differential structures involved are discussed in Sections 13 and 14. The 
resulting techniques are applied to the body of the simple family in Sections 15 
to 20, and to the corners (near the identity and near z = sgn*y) in Sections 21 
to 24. (The detailed conclusions are summarized in Sections 20, 24, and 25.) 

The results of these analyses are then applied in Sections 25 to 27 to the design 
of the charts. 


4. The charts. As noticed in the Summary, the charts are confined to only part 
of the simple family, i.e., toz = (y + c)? with p S 1 andO S y + cand their 
limits. The restriction to positive y + c is easily understood. It ensures mono- 
tone-increasing, well-defined, real-valued functions. The restriction to p < 1 
has no such mathematical reason. 

Its origin is entirely empirical and is basically rooted in the psychology of 
data-gatherers. Experience seems to show quite uniformly that when transforma- 
tion helps empirical data, it is almost always for p S 1. Thisis clearly a statement 
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Fic. 1. Basic chart when y is a small count. 
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about how data-gatherers choose to record (and initially try to analyze) data. For 
if z is the successfully transformed variable, and if the data-gatherer had chosen to 
record w = +/z, then we would have found z = w’ to be successful. The data- 
gatherer could have done this, but for some reason he does it infrequently. Hence, 
the restriction to p S 1 in the interests of convenience (particularly through the 
resulting larger scale). 

The particular coordinate systems used in these charts are developed in 
Sections 25 to 27 to meet the requirements developed by studying the topology 
and strength of the simple family. The choices are by no means unique and do not 
deserve summarization here. They are believed, however, to be good enough 
choices to make the charts very useful. (Perhaps even better forms will evolve 
from practical experience with the presently suggested forms.) 

The first chart is intended to apply to counted data with small counts, and to 
other situations where there is a least value which is not infrequently observed. 
The chart assumes that the original data has been ‘“‘coded” (rescaled) so that the 
least value is zero and that values between 1 and 10 are quite common. The basic 
chart is shown as Fig. 1. This chart (and other basic charts) can be conveniently 
used for repeated plotting by a simple device. If a sheet of tracing paper is placed 
over the chart, and enough of the chart traced to identify locations (for this chart 
the bounding arcs will suffice), the criterion values can be entered at appropriate 
points. The tracing paper can then be removed and roughly contoured. (This 
technique of economizing on special grids was learned from Churchill Eisenhart.) 
The result of such a tracing for the nonadditivity ¢ in the chinch-bug example 
discussed by Moore and Tukey [2] is shown as Fig. 2. The numerical entries are 


values of 10¢ on 20 degrees of freedom, so that the contours correspond roughly 
to 1%, 5%, and 20% levels. 
If a transformation of the simple family renders the effect of treatment and 
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Fia. 2. Use of Fig. 1 to obtain confidence areas. (Entries 10t, where t measures removable 
nonadditivity.) 
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replication additive (or at least makes zero the population average of the con- 
trast chosen to indicate removable nonadditivity), then the regions above and to 
the right of the various contours are confidence regions for this additivity- 
producing transformation. The confidence coefficients are, of course, 99%, 


95 %, and 80%. Clearly this single experiment has not narrowed things down too 
much. 


The second chart is intended for use with data ranging through a few powers of 


10 and safely away from 0. In using this chart, we think of the simple family in 
the form 


z = (y + cy) ™, 


where yo and é are constants selected for the particular example. The basic chart 
is shown as Fig. 3. Again the tracing technique is applicable. In Fig. 4 we show an 
application to the nonnormality of transformations of x’ on 6 degrees of freedom. 
The values shown are 


residual sum of squares 


—10 logio 
mean square for slope 


for the regression of 6 percentage points of x’ on the corresponding points for a 
unit normal. The plot is for ¢, = 0.5, yo = 0.6. The percentage points used were 
the lower and upper 0.05%, 0.5%, and 5% points. Large entries mean close 
approximation to normality. 


The quality of the approximation can be judged from the case 1 — geo = 0.3, 
c = 0, for which 


Fic. 4. Use of Fig. 3 to illustrate approximate normality of simple transforms of xi. 
(Large entries indicate good approximations. Scaled by yo = 0.6, e9 = 0.5.) 
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% point normality 3.4300(x3)99 — 5.6490 
lower 0.05% —3.291 
lower 0.05% —2.576 
lower 5% —1.645 
(50%) (0.000) 
upper 5% 1.645 
upper 0.5% 2.576 
upper 0.05% 3.291 


This quality is perhaps worth noting, since the first attempt to provide an 
example failed in the following way: The attempt was to examine the normality 
of simple transforms of wi, , the range in normal samples of 10, between the 
upper and lower 0.01 % points. 

The result was that 


4.56446(wy)” — 7.824436 


agrees with a unit normal to within +0.01%, which is the apparent accuracy of 
the available cumulative distribution. The simple family often does quite well in 
transforming to normality! 


~ 


. Tue StrrRENGTH OF TRANSFORMATIONS 


5. The purposes of transformations. The analysis of data usually proceeds 
more easily if 

(1) effects are additive; 

(2) the error variability is constant; 

(3) the error distribution is symmetrical and possibly nearly normal. 

The conventional purposes of transformation are to increase the degrees of 
approximation to which these desirable properties hold. We may think of these 
as three purposes, but it is often true that a transformation suitable for improv- 
ing one degree of approximation is also suitable for improving one or both of the 
others. 

The failure of any one of these properties can be recognized by the failure of a 
difference ys — y.« to equal another difference yp — yc , as we shall see shortly. 
Since ys — Ya = Yo — Yc is equivalent to yc — Ya = Yo — Ys, We May always 
assume, and shall assume, that ys < ys S yc < yo. Since the change of scale 
from y to ay is usually (and wisely) accepted as having no effect on the degree of 
approximation to any of these three properties, it is better to take the ratio 
(ys — ya)/(yp — Yc) (or some function of this ratio) as a measure of the failure of 
this property rather than to take the difference of the differences, (ys — ys) — 
(yp — Yc), since the latter is not invariant under such changes in scale. 

We now observe in what way the differences between ys — y. on the one hand 
and yp — yc on the other can exhibit the failure of any one of the three desirable 
properties. 

Nonadditivity of effect occurs when 


y(ur, 2) — y(t, 1) ~ y(us, v2) — y(ue, 1), 
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where y is a function of the two variables u and v. Usually y(u, v) represents a 
typical value (population mean, population median, etc.) of the distribution of 
values obtained for fixed u and »v. 

Nonconstancy of variability is exhibited when 


P(t) — g(t) *~ plus) — glue), 


where p(w) is the p % point of the distribution of observed values when u = w; 
and q(u:) is the corresponding q % point. 
Finally, asymmetry of distribution is exhibited when 


q(u) — y(u) # y(u) — p(u), 


where p(u) and g(u) are upper and lower percentage points cutting off the same 
tail areas and y(u) is the median, all of the distribution of values observable for a 
fixed value of u. 

In each case, failure is exhibited by a non-zero quartet. From a theoretical 
point of view our argument is successfully completed. From an empirical one, 
there remains a gap. Can we use quartets of individual observed values to study 
observed data as an indication of failures in the population? If we can, then 
defining strengths in terms of quartets makes very good empirical as well as 
theoretical sense. 

Clearly if we use observed order statistics to estimate percentage points and 
medians, we can do this for asymmetry and nonconstancy and, if we are willing 
to deal with population medians, for nonadditivity as well. Since a transform of a 
sample mean will not be precisely the mean of the transformed values, we cannot 
expect exact correspondence for nonadditivity defined for population means. If, 
however, variability for u and v fixed is not large compared with the changes due 
to alterations in u and », this disturbance will be rather unimportant and we can 
regard the means under specified conditions as pseudo-individual values. Since 
nonadditivity is most important when such additional variability is not too large, 
it is almost correct, from a practical point of view, to say that the behavior of 
quartets of individual or pseudo-individual values describes, empirically and 
adequately, the apparent failures of additivity, constancy of variance, and 
symmetry of distribution. 

We shall, for reasons which will tend to appear as we proceed, take the lo- 
garithm of the ratio of differences 


Ye — Ya 
Yo — Ye 


log = log (ys — ya) — log (yo — ye) 


as the measure of the degree of failure of a quartet to correspond to equal dif- 
ferences—whether nonadditivity of effect, nonconstancy of variance, or asym- 
metry of distribution is involved. There will usually be many such expressions 
(many quartets ys < ys S yc < yp) which may be worthy of consideration in a 
particular problem, and transformations will usually be chosen to make these ex- 
pressions smaller as a whole, balancing a gain on one against a loss on another. 
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6. The strengths of transformations. How shall we assess the strength of a 
transformation from y to z? Clearly by the extent to which it alters our measures 
of dissatisfaction. Given a quartet ys < ys S Yc < yo, which is transformed 


into a quartet z4 < zs S zc < zp, we have altered 


° Zp — 2 
into log ——enyore » 
z 


bp — Z¢ 

and for the initial quartet it is reasonable to measure the strength of the trans- 
formation by the change 

Zp — 2a log 4 — Ya 
2p — 2c Yop —~ Yc 


k(ya > UB; Ye; Yo) = log 


l g (zp ~_ za)(yp “ Yo) 
* (Zp — 2c)(Ys — Ya) 
Ge ~~ Be Zp —~ 2¢ 
— log ———_.. 
Ysa — Ya Yo — Yc 
The last form of this expression shows that there are natural definitions not only 
for the results of letting a pair of arguments coalesce 


log 


dz Zp — 2 
ky; Yes Yo) = Kyi, ¥15 Yes Yo) = tog Al — log — a 


- d 
k(ya, Yn; U2) = kya, Ysj Y2» Y2) = log 2—~4 — tog 4 , 
ve 


Ys — Ya dy 
but also for the confluent strength arising from the coalescence of both parts 


dz dz 
k(yi; yo) = kyr, yr3 Y2s Yr) = tog All - tog él, 


all of which involve replacing the difference quotients by their limits, the de- 


rivatives of z with respect to y. Clearly we might go one step further, and con- 
sider 


k(y) = lim "iy +4) — ky, y) - len) Ky; 9 + A) 
A+0 A ao A 


-(&)/@) 


as a local measure of the strength of the transformation; however, there are 
complications which we shall uncover shortly. 

With the exception of this last possibility, which we disapprove of, with reasons 
to be explained, all our measures of the strength of transformations involve at 
least two arguments (up to as many as four) and the arguments fall into two 
groups, whose separation (on the y-scale, say) is important. This means that if 
we face a practical situation, and wish to use a transformation of given strength, 
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we must begin by selecting one or more quartets typical of the data, and then 
examine the strength of the proposed transformations for that [those] typical 
quartets. 


7. The composition of transformations. Let us consider successive trans- 
formations from y to z and from z to w, together with the composed trans- 
formation from y to w. The respective measures of strength are, in an obvious 
notation: 


2B ZA 2p — 2¢ 
k 2( ’ , ’ = lo eiiitiemesiaemiy. att lo 7 
eet Re ae Cn — Va S ip — Ve 
Kew(Z4 » 2B}; 2cy Zp) = log saat longest 

2p — 2a 

We — Wa 
yw Ya y Ys Ye, Yb) gz Ue Va ig = = 

It is clear that 


Kyw(Ya » Ye;Yc, Yo) ad Kys(Ya »Y¥s;Yc, YD) + Kew(Z4 » 2B; 2c, Zp) 
but that, since in general 


Kn(Ya, Ys3 Yo, Yo) ~ kw(2a, 283 2c, Zp); 


we will have inequality in 


Kyo(Ya , UB Yo, Yo) ~ ky(Ya, YB Yc, Yo) + Kewl(Ya , Ys 5 Yc, Yo)- 


Thus strengths correctly calculated are additive, but strengths naively calculated 
are not. In particular, if w is the same function of z that z is of y, the strengths 
of the two transformations (applied successively to the same quartet) will not be 
the same, and moreover, the strength of the composite will not be twice the 
strength of either. 

Thus we see that the concept of strength is a little more subtle than might be 
supposed. 

In particular, an attempt to use k(y) directly fails because by going to the 
limit after division by A (which we must do to obtain a finite limit) we have lost 
the distinction between the original and the transformed quartet. In more 
detail, and working in terms of confluent strengths and in terms of z = f(y), 


Kyol(y; y + A) = Ky(y3 y + A) + Kew(z; 2 + 8), 
where z + 6 = f(y + A), so that, on division by A and passage to the limit, 


kel) = hata) + SOY hee) 


= Fly) + F bool), 
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and both the shift in argment and the multiplication by dz/dy must be taken 
into account. 

As an example, consider z = +/y and w = +/z, so that ky.(y) = 1/2y k.w(z) = 
1/2z, and dz/dy = 1/2+/y. We obtain, from the formula above, 


be = b+ (2S) = 2 sks Galahad 

By Bey \22/ 0 2y | Avy \2vV/y/ 4’ 
which is exactly what we would have obtained by direct calculation. Note that 
we combined 1/2y with 1/2z to obtain 3/4y, which is far from the naive answer. 


II. STRENGTHS IN THE SIMPLE FAMILY 


8. Confluent strengths of powers. We now return to the simple family, and 
begin with the confluent strengths of the pure powers, where 


(y’, p ¥ 0, 


z= 4 
\logy, p=0. 


Here we have 


dz dz 
bs us) = te al r. tog ar 


log % = (08? + @ — 1) logy, p # 0, 
dy | —log y, p =0, 


k(y. ; yz) = (p — 1) log (y:/y2) ‘for all p. 


Thus the strength is proportional to p — 1 for any pair of confluent arguments 
yi and y. 
If w = 2°, and z = y’, we have 


Kye(ys 5 ¥2) = (p — 1) log (y:/y:), 
Kew(21 ; 22) = (qg — 1) log (21/22) = (q — 1) p log (y:/ys), 
Kyol(ys 5 2) = (pg — 1) log (y:/y2) 

where we have used 


log (z:/22) = p log (y:/y2). 


We note that the additive decomposition of strengths is, in terms of the numerical 
coefficients, 


pq-1=(p-—1)+p(q-1)); 
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so that if p = 4, for example, we again get 


showing how the second square-rooting had only half the strength of the first. 
The special case g = 0 (logarithm for the second transformation) offers no 
difficulty, the decomposition being 


-—t«= @- 1+ (9; 


but an attempt to put p = 0 (logarithm for the first transformation) fails, since 
in this case 


log (21/22) * p log (y:/y2) 


and the power strengths are quite dependent on the particular values of y; and y2 
involved. The resulting transformations, such as +/log y, are not power trans- 
formations and seem to have been relatively little used. 


In particular, the sequence of transforms 


-,y Vy, ley, Wy Wy,-- 


are seen to be equally spaced in strength, where a more naive approach might 
have led to y, y"”, y"“, y"’, --- as an equally spaced sequence—a possibility not 
in accordance with either a careful analysis or empirical experience. 

In general, we shall refer to the ratio of k,.(y:, ye) to —log (y:/ye) as the 
power strength of a transformation. With this definition, the power strength of 
y’ is 1 — p and the power strengths of the sequence +/y, log y, 1/ Vy, «~~ are 
1/2, 1, 3/2, 2, 5/2, --- . (For a transformation which is not a pure power, the 
power strength will depend on the exact arguments y; and y .) 


9. Confluent strengths in the simple family. We can now go on to consider the 

confluent strengths of the modified power transformations 
a,(y + c)”, p ~ 0, 
z= 

og (y+c), p=0, 
where we are likely to choose the constant a, for each p so that sgn a, = sgn 
p. We have 

log = (p — 1) log (y + c) + constant 


and hence 


vite 
Yte 


k(y.; yo) = (p — 1) log 


The power strength will be given by 


Kyis 2) _ (y _ ») oe (un + O/( + 0) 
—log (y1/y2) log (y:/y2) 
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If we put y: = y — 6, ye = y + 4, and expand all logarithms in terms of 6, we 


obtain 
a y _ & ey + ¢) ro 4 | 
a p| 3 y(y + c)* : 


where the first term will usually be controlling, and where, by Rolle’s theorem, 
we may slightly redefine y, keeping it well between y, and y so that the first 
term is exact. 

We are now able to give some explanation of the fact, mentioned earlier, that 
vy and log (y + A) are both fairly good normalizing transformations for a 
Poisson variate of average value \. Their respective power strengths at y = A 
are exactly 


(l— 3) =3 
and approximately 


dos 


1-9-5" 


3, 


so that their rough equivalence is assured. 

A further appeal to Rolle’s theorem shows that nonconfluent strengths are 
always equal to confluent strengths for points near the center of each pair, and 
it is easy to convince ourselves that, if ratios of not much more than 10 to 1 are 
involved for ys/y4 and yp/yc, the confluent strengths give us an adequate 


picture of the behavior of the modified power transformations. This disposes of 
most practical situations except one. When y is a count, and zero values actually 
occur, we obtain infinite ratios for ys/y, . A brief inquiry into strengths in this 
cases does not provide any results of much practical use, which is somewhat 
unfortunate. 


10. Extreme strengths. We have expressed our strengths on a logarithmic 
basis; this is quite reasonable for weak and moderate transformations, but can 
easily be misleading for extremely strong transformations in practical applica- 
tions. For example, suppose that only three different values are ever observed, 
0, 1, and 2, and that they have been transformed into 0.0000, 1.0000, and 1.0001. 

By our definitions, the transformation carrying them into 0.0000 0000, 
1.0000 0000, and 1.0000 0001 would be twice as strong. From most practical 
points of view this is quite wrong, since both transformations are so very close 
to the one carrying 0, 1, 2 into 0, 1, 1 (which by our definitions has infinite 
strength). In practice, infinite strength is not infinitely far away. 

This means that, in charting, we should pay very considerable attention to the 
behavior of strengths on our scale where they are small or moderate, much less 
attention to their behavior where they are large, and should avoid moving 
infinite strengths to infinity. 
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III. Locat Structures or FAMILig of TRANSFORMATIONS 


11. Introduction. It is natural to question the intrusion of topology and 
differential structure into the present paper. For in choosing transformations for 
empirical purposes—whether for data analysis or theoretical manipulation— 
only a limited degree of detail about a transformation is relevant. If our interest 
is coarse-grained, while the attention of topology is directed to the indefinitely 
fine grain of arbitrarily small neighborhoods, how can topology help us? There 
are at least two answers to this question, and both are quite relevant. 

First, there is a heuristic principle that the study of behavior in the small 
often, but not always, leads to results which are helpful at a larger scale. In the 
present circumstances we may reasonably expect this principle to apply without 
exception. 

Second, we are trying to chart the family of transformations in such a way 
that, when we plot values of some interesting quantity on the chart, we can use 
these values as advantageously as possible in: 

(1) understanding the relation of the transformation to the quantity; 

(2) picking a single useful transformation; or 

(3) indicating a region of acceptable transformations. 

For any and all of these ends, we should like to have the quantities of interest 
vary smoothly over the chart. 

The finest-grained aspect of smooth variation is continuity. Thus we should at 
least like to have our quantities continuous functions over the chart. To do this 
we must respect the topology of the family of transformations in choosing the 
topology of the chart. This respect need only be one-sided, however, for if 

(1) position in the family is a continuous function of position in the chart, and 

(2) the quantity is a continuous function of position in the family, 
then the quantity will be a continuous function of position on the chart. In a 
phrase, we must not bring together in the chart things which are apart in the 
family, although we may, if we wish, keep apart in the chart things which come 
together in the family. 

For all that the second answer leaves us free to pull things apart in the chart 
wherever and as much as we wish, the first answer suggests that we should re- 
strain such tendencies as much as may be reasonable. 


12. Intersection and tangency. Whether one curve intersects (we shall always 
mean “intersects at a non-zero angle’ when we say “‘intersects’’), or is instead 
tangent to, another curve in the family is a relatively easy question to settle, 
and one that can usefully be asked near almost any point of the family. The 
analog of the rule just stated above may be put as follows: If the curves intersect 
in the family, we cannot allow them to be tangent on the chart; if they are 
tangent in the family, then we can, if we wish, let them intersect on the chart. 

Such considerations may arise in the interior of a family, but are more likely 
to be important on its boundary. If a family is two-dimensional (as the special 
family is), the boundary will consist of curves and corners. It is at these corners, 
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which are often singularities in some other sense, that intersection and tangency 
will be most critical. 

At such a corner we should avoid, at all costs, the introduction of new tan- 
gencies, and should try to preserve the tangencies of the family whenever this 
can be done without contravening more important considerations. 

We shall have to deal with a least one corner where there are several alternate 
differential structures. This is likely to force us to eliminate tangencies. For if we 
have two curves which are tangent in all of these structures, they may not be 
tangent in the same way (“‘corresponding”’ pairs of points, one on each curve, in 
one structure may not “correspond” in another). In this case, no one sort of 
tangency in the chart can possibly be satisfactory. We must destroy the tan- 
gency and chart the curves as intersecting at a finite angle. 


13. Nature of the topology. The discussion above has assumed a topology 
without describing it. We need to say something more. If we have a family of 
transformations depending on parameters a, 8, y,-°--, 


$= 2(y; a, B, ¥, ~ ); 
and if a = a(t), 8 = A(t), y = y(t), --- defines a curve of transformations 
z, = 2(y; a(t), B(t), v(t), --+), 


when do we say that z, converges to z as t — 0? With a mild reservation, we say 
that z, — 2 if there exist A, and B, so that 


Ait+ Baily) _— 2o(y) 


for every relevant y. Notice two things here: First, we have to introduce the 
variable linear transformation defined by (A; , B,), since we regard the class of 
all C + Dz,, for C and D fixed, as equivalent and our topology really refers to 
equivalence classes of transformations. Second, the convergence is for all rele- 
vant y, so that a change in the set of y’s considered relevant may change the 
topology. In our present case—the simple family—such changes only occur (i) 
near the far corner, or (ii) for degenerate cases where at most two values of y are 
relevant, and all non-degenerate transformations are equivalent. (A transforma- 
tion is degenerate if all its values for relevant y’s are the same. The mild restric- 
tion mentioned above is that z should not be degenerate.) 


14. Nature of the differential structure. We have also talked of “intersection” 
and “tangency” without specific basis. We have available a differential structure 
in which these terms are well defined, but it is rather different from those struc- 
tures which appear in, say, Riemannian geometry. It does not start with some- 
thing like a differential form which gives local meaning to ratios of distances 
and from which local differentiation, tangency, etc., follow. It cannot, since we 
shall not have local distances. Instead, it starts with the notion of a direction from 
one transformation to another and proceeds to the definition of a tangent. 

Given two transformations, z, and 22 , their difference, z; — 22, is also a trans- 
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formation and is, of course, equivalent to any multiple of itself. If we replace 2 
and z, by equivalent transformations, say A + Bz, and C + Dz, the difference 
becomes 


A+ Bz —C-—-Dza=A—C+ Bz, — 2) + (B— Dz. 
A-—C+ (B—- D)ja+ Dla — a), 


namely, any member of a one-parameter family of equivalence classes where 
2: — 22 is additively combined with various amounts of z or z,. The resulting 
direction is defined, but not too precisely. 

If, however, 2: = z,; and z, = 2 where z; converges to z as t — 0, then we 
can focus our attention on a particular A, + B,z,; whose values converge to those 


of z . If the convergence of the values is sufficiently differentiable for each y, then 
we have 


A,+ Baily) = 2+ t’(y) + O(F), 


where O(¢’) means terms of order ¢ or smaller as t — 0. We may regard z’ con- 
sidered as a transformation, as the derivative of z, at the point (at the trans- 
formation) 2. 


It is well to note that z’ is also not uniquely defined. For if we replace B, by 
B, + tC, as we may without disturbing the convergence, then 


A, + (Bi + tC)zdy) = 20 + the'(y) + Czo(y)] + O(?), 


and we see that z’ + Cz is also a “derivative.” Except for the inevitable linear 
transformations this is the most general form, and we may say that the direc- 
tions at z correspond to the families of transformations of the form 


A + Bz’ + Cau, 


where A, B, and C are arbitrary constants with B ~ O. 
Two curves 


ry = 2 + tz’ + O(?) 
and 

Ze = 2 + 8st + O(s°) 
will have the same direction at 2p if 


A+ Bze+ Ca =2 


for the relevant values of y and suitable constants A, B, and C. 

Notice the appearance again of the relevant values of y. It will again usually 
be true that, so long as y takes at least three different values, it will not matter 
which three. But in exceptional circumstances it will matter, and it is in this way 
that a corner may receive various differential structures. 

The same sort of construction can be extended to higher derivatives. We shall 
need it in only one special case, and no general discussion seems necessary. 
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With these general principles and techniques in mind, then, we can proceed 
to study the simple family. 


IV. Tue Bopy or THE SrpLe Famity 


15. Definitions and limiting forms. As previously stated, the simple family of 
transformations consists of all transformations, which can be obtained by 
compounding linear transformations with one (integral or fractional) power 
transformation, and of all limiting forms of such transformations. 

The effect of an additional linear transformation from z to A + Bz with B ¥ 0 
applied last is always regarded as trivial. Transformations which can be trivially 
converted into each other are equivalent. Thus z = 3y* — 17 and z = 0.001 + 
0.03777 are regarded as equivalent to each other and to z = 7’. This is not the 
case for an initial linear transformation. Thus z = log y and z = log 
(y + 10 000 000) are quite different transformations. When we are dealing with a 
curve of transformations, such as, for example, the curve of pure power trans- 
formations with powers between 1 and 0, namely 


{y’", 1 > p> 0}, 


we shall not distinguish equality from equivalence, and shall freely write such 
“equations” as 


{3y’,1> p> 0} = {y’,1>p > 0} = {14 — 2y’,1 > p > 0}. 


We are only concerned with real values, and prefer monotone functions, so 
that we could take as our definition of power 


y” = sgn (y) | |’, 
where the signum function, sgn (y), is defined by 


+1, y > 0, 
sgn (y) = 4 0, y = 0, 
—1, y <0. 


With this definition, 
(y’)* = y™ 
but 


dy” 


= py” (sgn y) = py” (sgn y”) = py” (sgny” = ply|”. 


dy 
(Differences from other definitions of powers only occur for negative y, and since 
negative y’s occur infrequently in practice, this possible definition should be re- 


garded as precautionary rather than important. We shall not use it explicitly 


here, merely remarking that its use would make only trivial alterations in our 
results.) 


The limiting forms are associated with the exceptional values of the power, 
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—o,0, and +, and with the extreme values, — © and +, of the constant. 
For p — 0, we have 
(y +c)? = ee) ~ 1 + p log (y + ©), 


which, after a suitably varying linear transformation, converges to log (y + c). 
For p — + and ¢ tending to a finite limit, the convergence is to less familiar 
objects. The transformations involved depend on the particular set of values 
transformed through the extreme values, ymax aNd ymin , actually present in the 
data. If we define a function gp by 


0, u # 0, 
1, u = 0, 
we then have for p — + and a suitable choice of A, and B,, 
A, + By(y + ¢)? + —go(y — Ymin) = —1 + sgn"(y — ymin); 


where the last equality is valid for actually occurring y’s, since y < Ymin is im- 
possible. 

As in the sequel, a “+-” on a function indicates that negative values are to be 
replaced by zero. Hence, in particular, 


go(u) = 


0, sa, 
L y > 0. 


There remains the case where p — + © but p/c tends to a finite limit. (Hence 
c— +, also.) Here 


e|p/e 
(y+)? = (1+4)p-e[(1+4)] ; 


and since the expression in[ ] tends to e’, the whole expression, after a suitably 
varying linear transformation, tends to 


wo" = | 


my 


e's 


where m = lim p/c. 
Thus the simple family contains: 


z=(y+c)’, 
z = log (y + ¢), 


z=e™, 


and all linear transforms of these transformations. 


16. Normalization. Because (i) they are apparently of the greatest practical 
importance, and (ii) they determine the remaining cases by symmetry, we shall 
confine our detailed analysis to transformations with p S 1 (and their limits). 
Consequently, we shall be concerned with least values but not with greatest 
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ones. It will therefore be convenient, for the remainder of this part, for us to fix 
initial scale of the y’s so that the three smallest values of y are 0, 1, and yz, where 
Yy2 > 1. This can be done by a linear transformation, and undone again by 
another, so that our conclusions are to be changed by at most a linear trans- 
formation when we are through. 


We shall also assume that the reasonable values of c are such that y + c is 


non-negative. We shall treat the general case, specializing later to the case 
where c is positive and bounded away from zero. 


17. Tangency and intersection for c small. We now investigate the situation 
along and near the curve {z = y”} in more detail. We begin by considering 


(y + c)” as c — 0, where the value of p is important. If p < 0, and temporarily 
we shall write p = —k, then 
=e >a, y = 0, 
(y+c)’ = 


(ytcy*—y", y>QO, 


0, y = 0, 
1 — c(y + c)” re 
© dapll catia: 
| (; + :) , y>0 
sgn*(y) — c‘(y~ sgn*(y)) + O(c**). 


As c — 0 for k fixed, the transformation tends to sgn*(y) and appears like an 
additive mixture of this and y“sgn*(y). This additive piece is different for 
different values of k, so that we learn that the curves of transformations 


{(y + c¢)?,c—>0} = {1—c(y+c)?,c—>0}, p<, 


tend to sgn’ (y), but no two have a common tangent. 
For p = 0, we must deal with log (y + c), where 


log (y + ¢) _ 
oat —loge |y— log"(y + ¢) 
—logc 


“Sea log*(y + c) 
menu —logc 


+ log*y c 
sgn'y — —-—— + O , 
—logc —logc 
and hence the curve 


{log (y + c),c— 0} =(1 — e+e, oh 
—log c 


also tends to sgny, but with a different tangent. 
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There is also another simply described curve with c small (in fact with c = 0) 
which tends to sgn*y. This is the curve 
{y’,p— 0}, 
where we have 
(0, 

= \y = ePlosy 

= sgn*y + p log*y + O(p’), 
We see now that the curves 

{y?, p > 0} and {log(y + c),c 0} 


are tangent to each other at the transformation (z = sgn*y), with p correspond- 
ing to 1/(—log c). 
For p > 0, we have 


[Cy y=0 
yt+or?={ . c\? 
v(atsy, y > 0, 


goly)e? + y” + cpy” + O(c’), 
so that the curve 
{(y + c)?,c 0} 


tends to the transformation y’. 
Near y” we are also interested in 


p+ p dlogy 


yO = yr = y? + *yP(log y) + O18"), 
which is easily seen to have a different tangent than the previous curves at y’. 


18. Tangency and intersection for large c. If c is large, 


Pp 
(y + ¢)” e(1 + 4 = ¢” exp [p log (1 + y/c)] 
‘ 


exp [(py/c) + py’/2c* + O(1/c’)] 
2 
P pyle Py 1 
= ce € + 57 +0(4)). 
Thus, ifc = 1/e and p = m/e (note that m < Oif pis always < 0), we have 
e'(y+te? =e+ 5 mye” + O(ee™”), 


and the curve 
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i(y +c)”, p = me,c-—> @} = {ce (y +c)”, p = me,c—> @} 
tends to e”™”. 
We also wish to consider {e"*”, § + 0} for which 


ety — ome¥ = o™(1 + by + O(8)) 
= e™ + dye™ + O(6e™). 
Thus the curves 


{(y +c)’, p = mce,c— ~} and felmtor 5 +0} 


tend to the same limit, but with different tangents. 
We also have to consider c — © and p fixed, for which 


P ee 3 
y+or=e(i+2) = ¢ 1 + PY 4 PP — VD) y +0(2)) 
c c 2c? c 


Cig putes 4 | p—-—l1_s ‘) 
=. (y + c) = 9 tre + 0(4), 


so that the curves 
iy + We @} = {Sey + 0)” — 5,0 
y j ? > | 


all tend to y with a common tangent, a natural parameter along this tangent 
being (p — 1)/2c. 
Since e”” has been included, we must also consider the case m — 0 where 


m 
_ =yts 


y’ + O(m’), 
{e™, m — 0} 


has the same tangent as the curves just considered, with m corresponding to 


(p — 1)/c. 


19. Tangency and intersection for c moderate. If p — 1 with c constant, we 
learn from 


(y + c)°*? = (y + che" = (y + c)(1 + dlog(y + c) + O(8)) 
that 


(y+c)° —c=y+ dy +c log (y + c) + O68) 
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and hence that the curves 


{(y+c)’,p >1} 


tend to y, each with its own tangent. 
At the other extreme, where p = —k tends to — ~, we have 


0, y 
1-—c(y+c)? = 1-(1+2), . 
- (1 +2) (44), 
sgn*y — € + sy, ety — 1) + o((2 * 1’); 


{(y+c),p >—} = {1-—c(y+c), p> —@} 


hence the curves 


tend to sgn*y with a common tangent and a natural parameter which is a mul- 
tiple of [1 + (1/c)J* = [1 + (1/c)]’. 
We also have, asm > —o, 


0, y 
1—e™” 1 — e”, y 
1— ee", y> 
sgn*(y) — e™(go(y — 1)) + O(e"”), 
and we see that the curve 


fe™,m +>—@} = {1 — e™,m— —@} 


has the same limit and tangent, with e” playing the role of [1 + (1/c)?’. 
Everything is well behaved as p — 0 when (y + c)” tends to log (y + c). 


20. The resulting picture. If we put together all the individual results of the 
last three sections, we are forced to the qualitative relation of the curves 


{e™, m < 0}, 

ly," 1 > p> 0}, 

{(y + c)”, p fixed (1 > p > 0),0 <c < ~}, 
\log (y+ ¢),0<c< a}, 

\(y + ¢), pfixed (p >),0<c < ~}, 

{(y + c)”, c fixed, 1 > p> —~} 
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Fic. 5. The topology of the simple family. 


shown in Fig. 5. These relations of limit, tangency, and intersection substantially 
limit the diagrams on which we may reasonably plot the transformations 
(y + c)” (with their limits) for 1 > p > —~ andO Sc < @, 


V. Tue CorNERS OF THE SIMPLE FAMILY 


21. The neighborhood of the near corner. We can try to learn more about the 
situation near the corners at y and sgnty by carrying the expansion out to 
further terms. We begin with the neighborhood of y, where 


o > p—1:2, (p—I(p—2) 5 (4) 
=* (y + c) ¥¢ 7 a ea © Sora 


my—1 2 


™m 2 m 3 3 
— sutsy tay t+ Om), 


€ 


(yto'" —c=yt+ Hy +c) log(y+c) + . (y + c)(log (y + c))* + O(8), 
and where, in particular, we may put c = 0 in the last expansion. 


If, taking account of the signs of p — 1 and m in the region we are interested 
in, we set 


Cc -p P 
—c "(y + c) 
5 oy 


1 — 1) 
m 
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which are clearly consistent, since c = © and 1/c = 0 are the limiting values 
which lead to the exponentials. 

We know that the situation near y (i.e., near c = anything, p = 1) should be 
represented by an angle of some size, with the curves {(y + c)”, c fixed, p — 1} 
coming in with separate tangents and the curves {(y + c)”, p fixed, c — ©} all 
coming in with a common tangent. It is not unnatural to try to map this into 
the first quadrant of a (u, v)-plane, with the v-axis playing the role of the com- 
mon tangent. The natural identification near (0, 0) is, then, 


v =h, 


u Aa 
c 


However, experimentation leads to complexities, and reflection shows us that we 
may reasonably alter the h’ term, since it is a higher-order function of the h 
term. 

Omitting the h’ term, 


whence 


and the radius of curvature at (0, 0) of the curves p = constant may be found 
from the parametric form for a circle through (0, 0) tangent to the v-axis, namely, 


v= Rsiny ~ Ry, 
u = R(1 — cosy) ~ 3Ry’, 
whence 
gi eee eM kee 


~ ORV/2) 2u W/o 2 2 
We thus reproduce the situation around y to this accuracy if we make the follow- 
ing identifications 

{curve of constant c} = {ray through origin of slope c}, 


{curve of constant p} = {circle with (0, 0) and (1 — p, 0) as diameter}. 


We then have a map of the simple family which is sound for transformations 
near the identity. 

It is this map which was used by Moore and Tukey [2] and is shown as Fig. 6. 
In polar coordinates, tan @ = candr = (1 — p)/V/1 4+ 2. 
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Fig. 6. The neighborhood of the near corner. 


22. The neighborhood of the far corner. Now we go to the neighborhood of 
sgn*+y, and again carry more terms to find 


ox) 
l1—ec(y+ec)? = sy - (1 + *) goly — 1) 


-(1+4) ey wt @=-k- @), 


my 


e™ = sgn'y — e"ooly — 1) — e™ goly — ys) + - °° (m— ~), 


- 
c‘(y + c)? = sgn*y — (¥) sgn*y + --- (p= —k <0,c—0), 


( \ os 
—_ log y + ¢) = sgn" y —_— log” y + --- (c ait 0), 
— loge log c 
y” = sgn'y + plog’y + --- (p — 0). 
We again concentrate on the curves which come to a common tangent. (At 
this corner these are the family c = constant rather than the family p = con- 
stant.) This directs our attention to the first of the five formulas and to the 


quantities 
1 —k 
e=(1+1) and t= (1+ 
" c 


so that we would have 
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The situation is both complex and dependent on the particular values of y. 
If we are going to obtain results of wide usefulness, we should have to fix, more or 
less arbitrarily, a single value for ye . 

In any event, 


—k 

Y2 
42 : 
jE] fete] at men 
i+! 


t 


Thus, as p ranges from 0 toward — ~, the limiting value of t/s ranges from 1 to 0 
and we see that we have filled up only one octant. 


Moreover, if we calculate coordinates for y” in what seems to be the same way, 
namely, 


1 — (value at 1) = 1—17=0 


1 — (value at y2) = 1 — y? ~ — plogy, 
we find that 


Either we must accept a slit of some sort between {(y + c)’, ec — 0} 
and {y’, p — 0} or we must recognize some difficulty with our procedure. 
Actually, closer examination shows us that the coordinate system we are 
using does not work at all well for y”. It is set up to work for 1 — c‘(y + c)’, 
which is 0 for y = 0, and approaches 1 from below for all y > 0, most slowly for 
the smallest y. On the other hand, y” is 0 for y = 0 and approaches 1 from above 
for all y = 1, most slowly for the largest value of y. These are not compatible. 


Essentially the only renormalization we can try is to expand y*(1 — Cp), 
which leads to 


C — log y 
> 
which can never be made to vanish for more than one value of 4: . 

We clearly should give up any attempt to get second-order accuracy in our 
representation around the far corner. Indeed, when we reflect on the effects of 
varying values of y2 , we see that we are in the situation discussed in Section 6. 
Not only second-order accuracy, but also first-order accuracy should be given 
up. We should arrange to have these curves intersect on the chart, even though 
they are tangent in the family. The far corner is really a singularity! 


Sng 
8 


23. The far corner if y + c is always positive. In view of the essential way in 
which the far singularity depended on y + c either being actually zero or ap- 
proaching zero, and in view of the fact that we may often have situations where 
y + c will not approach zero, it is probably worth while to reconsider our nor- 
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malization and to look separately at the case where y + c is always positive. It 
will be convenient to normalize so that y = 1, c = 0, as we may clearly do. 
The only singularity is now in the vicinity of p = — ©, where we have 


k 
1 — (1+ )*(y + c)? & sgn*(y — I) (2 = :) sgn*(y—1) (c= 0) 


m) 


y 
1- = ee sgn*(y — 1) — e®* sgn*(y — 1) (m < 0) 


and where, therefore, if we adopt as coordinates s and ¢ the deviations of the 
values from sgn*(y — 1) at the next lowest values y, and y; of y, we find 


l+c\ _ mos-» 
, (+*) -. . 


_ {ite gr m(y3—1) 
' (i+<) : 


and we see that the limiting value of t/s is always zero. Hence all the curves for 
p — — (and the curve for m — «) are tangent. Moreover the final approach 
to sgn*(y — 1) is exponential in speed. 

For practical charting, we may either bring all the curves to a common tangent 
or leave them separate, so long as we pay some attention to the curves s = con- 
stant. If we let the curves intersect at finite angles, quantities which depend 
continuously on the transformation will be continuous on the chart—they will 
merely tend to be all more and more closely constant on the curves s = & as 
8 -— 0. 

The exact nature of the curves s = s, as well as the exact nature of the 
tangency depends on the value of y: . If we adopted a chart with tangency, we 
would have to have a separate one for each value of y, to avoid introducing dis- 
continuities. If we adopt a chart without tangency, we may hope that the 
curves s = 8 will not be too badly distorted over a range of values for y, . Thus 
it again seems better to use a non-tangent chart. 

If ye is arbitrarily close to 1, say yz = 1 + e¢, we find 


k k 
+= CE) (ey entcten 
2 
i+ tha 


so that —k/(1 +c) = p/1 + cis a natural parameter (which is constant on the 
curve which is the limits of the curves s = 8 as yz — 1). We find that m is equiva- 
lent to p/1 + c. Thus it may be well to keep the curves 


onienstl constant 


l+c 


relatively short as p—> — ©. 





630 JOHN W. TUKEY 


24. Conclusions. We thus conclude that we need two forms of chart, depending 
on whether y + c = 0 is to be considered reasonable or not. Except near sgn*y, 
both charts are to follow the topology of Fig. 5. Near the identity transformation, 
both charts should conform to Fig. 6. 

Near sgn*y, both charts should distort the topology of the family by opening 
out the angle and making the previously tangent curves meet at finite angles. 
The details of these openings out are not clearly prescribed. In the case where y 
+ cis safely > 0 we will probably wish to keep the curves p/(1 + c) = constant 
fairly short. The most obvious difference in the two cases will be that if y + c 
can vanish, then (y + c)” approaches sgn*(y + c) as p — 0; while if y + c can 
not vanish, it approaches sgn*(y — ymin), but only as p + —. 


VI. CHARTING THE SIMPLE FAMILY 


25. The general case (where y + c can vanish). If we are to be fully prepared 
for the case where y + c can even be zero, then we will want our chart to have 
the following characteristics: 
(A) In the vicinity of y itself, the topology and differential properties should 
be those suggested by the analysis of Section 21: 
(1) the points with 0 < c S ~, p S 1 fill up the vertex of a quadrant; and 
(2) polar coordinates are, roughly, given by tan @ = c,r = (1 — p)/ 
Vite. 

(B) Along the are c = 0, 1 > p > O the spacing of the values should be, 
roughly, uniform in p as suggested by Section 8. 

(C) In the vicinity of sgn*y the topology should either be as suggested by 
Fig. 5 (and Section 20) or the tangency of the curves with c = constant, 
p — —© should be replaced by intersection. (See Section 23.) 

(D) Along the are c = 0,1 > p > 0, the curves p = constant should intersect 
the are and not be tangent to it. (See Section 17.) 

We have, then, to find a chart form with these properties. 

To begin with, we have a figure composed of the vertex of a quadrant, and two 
curves running out the sides of the quadrant which eventually meet again (at 
sgn*y). The simplest figure which we know with these properties is a quarter 
sphere, stretching from pole to pole. In the vicinity of each pole we have a 
quadrant, and the meridians which bound this quadrant meet at the opposite 
pole. Thus, as a first step, we agree to try taking y as one pole and sgn*y as the 
other. The arc{y”} becomes a meridian, the arc{e””} becomes a meridian 90° 
removed from this, and the central region fills in the quarter sphere between the 
two. 

Let @ be an angle describing the meridians, so chosen that @ = O° for the 
arc {y”} which has c = 0, and 6 = 90° for the arc {e””} which corresponds to c = 
oo. Let y be an angle describing the parallels of latitude, with y = 0° for the 
pole representing y, ¥ = 90° at the equator, andy = 180° at the pole repre- 
senting sgn*y. Our problem is to find analytic representations for @ and yw in 
terms of p and ¢ which will give us the desired properties. These we find by 
successive approximations. 
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Near the y pole, we want 


l—p 
tan@~c y~ —> 
: VY l+e¢ 
and when we observe that 1 — p ranges up to infinity, it is natural to try 
tan 6 = ¢, oe ee ee 
2 vVl+e 
But when we realize that while 1 — p ranges up to + © for c ¥ 0, it only ranges 
up to 1 for c = 0, it is natural to try, perhaps, 
y c 1 +p 
= - => — 
tan C, ctn 5 f=3 "reg 7? 
where we have (i) replaced 1/1 + c* by c in order to have the first term ineffec- 
tive for c = 0; (ii) introduced the new term with a factor 1/(1 + c) to make it 
ineffective for c = «©, where the first term ~ — c/p, which is appropriate; (iii) 
gone to the cotangent instead of the tangent in order to make the combination 
of the two terms easy; and (iv) introduced tan* rp/2 as a function spreading the 
values of p from 0 to 1 out uniformly along the are c = 0. 

This choice of functions gives a rather reasonable chart, but it still fails to meet 
our requirements in that (i) the curves {(y + c)”, c — 0} are tangent to {y”}, and 
(ii) near sgn*y we have not begun to obtain the proper local structure, since the 
curves c = constant intersect, while the curves p = constant are mutually tan- 
gent. We can try to get at both difficulties at once by changing to 


a ua 4 ans c 1 + ™p 
tan@é=c+ (1 — pve, etn 5 i—>tise™ > 


This change pushes the curves c = constant upward, more strongly for lower 
c and larger 1 — p, and thus tends to meet both needs. However, the curves p = 
constant are still mutually tangent to the boundary curve c = 0 at sgn*y. In 
order to correct this in a simple way, we need only add a (—p)* term, which will 
vanish for p = 0. The result is 


rs i. _ = 2 5 1 +™p 

tand=c+ (1 — p)vWe+ (-p)’," ms*T tit. ™ >" 
26. Plotting. We now have represented the central part of the simple family on 
a quarter-sphere. For practical purposes we require a plot on the plane. The 
solution adopted here was influenced substantially by the existence of some 
special equal-area graph paper which the writer obtained from Mount Wilson 
Observatory during World War II in connection with problems of mutual inter- 
est. This graph paper represents a half-sphere area-true on a rectangle and pro- 
vides two families of spherico-polar coordinates, one with poles at the top and 
bottom of the rectangle and the other with poles at the centers of the right and 
left sides. If u, r are rectangular coordinates withO S uS2*,-lsrs+1 
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defining the rectangle, then y and @ are related by 
cosy = V1 — rcosu, 
cos 6 V1 — r ese y, 
sgn @ = sgn r. 


The combination of this representation of the half-sphere on the plane with the 
map on the quarter-sphere gives the plot already presented in Fig. 1. 


27. The case of moderate y + c. Incase y + c cannot vanish, we want a plot 
with a different and simpler topology. The choice 


tan 6 = ¢, 


. 


tan 5 


nae ie 
Vite 
where the transformation is written in the form 


(y + cYyo) owe 


scaled as to size by yo and as to change in exponent by é , is the natural trans- 
formation near g = 0, and meets our requirements everywhere else. 
When combined with the same graph paper, the result is as in Fig. 3. 
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THE PROBLEM OF ESTIMATION 


By H. Srernnaus 


University of Wroctaw 


0. Introduction. A treatment has proved successful in m cases out of n; what is 
its efficacy? In other words: What will be the frequency of successes if we con- 
tinue to apply the same treatment in all cases of the disease it was invented for? 
A lot has been examined by sampling and m items out of n proved defective; what 
is the best estimate for the fraction defective in the whole lot? This is another 
example of the same question. A hundred people of a certain tribe have been 
classified as to their belonging to the blood groups A, B, AB and O; how are we 
to estimate the frequencies of the 4 groups in the whole tribe? It is not an over- 
statement to say that the abundance of applications gives the problem of es- 
timation an importance sufficient to rank it among the principal problems of 
science. This is the reason why, for more than a century, it has not ceased to 
haunt the ingenuity of mathematicians. 

To make things quite plain let us imagine a statistician (S) compelled by the 
devil (D) to play the following game: 

The devil has a collection of coins and he knows for each of them its prob- 
ability p of showing heads as a result of tossing; he is rich enough to have speci- 
mens for any p in (0, 1). He chooses a coin suiting his fancy and lets the statis- 
tician throw it n times; it shows heads m times and it is up to the statistician to 
give an estimate p’ of the value p which is known to D but unknown to him. 
This being done, S pays to D $(p’ — p)*. S tries his best to reduce his loss by 
an appropriate method of guessing, as far as possible. If he succeeds in finding 
the best method he will not regret the money lost: his will be the fame of having 
solved our problem of the best estimate, if “‘best”’ is understood as minimizing 
the expected square error of the estimation; the rules of the game have been 
fixed by the devil in accordance with such an interpretation of our problem. 

The following remark may explain the link connecting our game with the 
problem of point-estimation. The classical method solves this problem assuming 
that the distribution of coins employed by the devil is known to the statistician. 
He computes his guess, combining by Bayes’ rule this knowledge with the ob- 
served result of tossing; the guess p’ will be equal to the a posteriori value Ep. 
It can be defined also as the value of p’ that minimizes the expected 
loss E(p’ — p)*. Thus the rules of our game correspond to the problem of point- 
estimation in the case of a known prior distribution; it is not artificial to employ 
them in the case of an unknown prior distribution. 

I proposed this problem in 1954 [10] in Prague and in Berlin, calling it ““Das 
statistische Spiel,” but it is only in 1956 that I have been informed by L. J. 
Savage that it has already been solved: J. L. Hodges, Jr., and E. L. Lehmann [4] 


Received July 30, 1956, revised April 22, 1957. 
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attribute the priority to Herman Rubin in their important paper published in 
1950. Another paper [3] by M. A. Girshick and L. J. Savage deals thoroughly 
with the same question; the results have already been included in books [2] 
and [8]. The generality of the results reached by the American school and their 
priority make it necessary to explain the reasons for publishing this paper: the 
first is that my exposition will perhaps be easier to read for some readers who 
do not feel comfortable when spoken to in the language of consummate specialists; 
the second is to be found in the last sections, which deal with some cases not 
covered by the authors quoted above. 


1. A statistician who does not believe in probabilities when he is faced with a 
single game does not even look at the gambling table. He says: ‘““The number m 
gives me no information whatever about p. My method is to declare p equal to }. 
In the worst case the true p will be 0 or 1 and I will lose 25 cents. Any gambler 
adopting a system different from mine exposes himself to a greater loss than a 
quarter of a dollar: if his p’ is greater than } he loses more than 25 cents for a 
very small p and if his p’ surpasses } his loss is greater than 25 cents for a p near 
unity. This is the reason why I abstain from counting heads and tails.” 

The method described above denies the possibility of statistical estimation, 
since its estimate is not influenced by the results of experiments. It has, however, 
the advantage of the certainty it gives to S of never losing more than 25 cents, 
a guarantee that no other method is capable of. From this point of view we can 
define 25 cents as the “value of the game.”’ The devil’s point of view, however, 
is different. As he is never sure that S will not guess the exact p, in which case 


his gain is nil, the “‘value of the game’’ is 0 when computed by D. In both cases 
we consider only the gains of D; nevertheless we get two different values: 25 cents 
and 0. Such games may be called “open.” 


2. As is generally known, many mathematical theories were invented to master 
our problem by the calculus of probability. Laplace did not hesitate to apply to 
it the so-called inverse probabilities or “‘probability of causes” derived from the 
observation of effects. To avoid misunderstandings we will speak of frequencies 
of different coins instead of the probabilities of their having been chosen by D. 
Such an approach implies the substitution of a sequence of games for a single 
game, and of the mean loss for the single loss. Following this principle, we have to 
imagine S being condemned by D to perpetual gambling. The devil is allowed to 
choose a new coin in each round of the infinite sequence Q of rounds, but n re- 
mains unaltered. It is the statistician’s business to make his mean loss as small 
as possible. He could do it easily if he knew the frequency of different coins em- 
ployed. To be more explicit, let us assume that D adheres to a certain function 
f(p), the “density,” and that S knows this function. In other words, he knows 
that for any two numbers u, v (0 S u S v S 1) the relative frequency of the 
coins with u S p S vis Jt. f(p) dp. The function f(p) is integrable, nonnegative 
and fi f(p) dp = 1. S puts the following problem to himself: the whole sequence 
Q of games can be split up into n + 1 subsequences Q,, (m = 0, 1, 2, ---,m), 
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Q,, being the subsequence consisting of all rounds showing the same number m 
of heads. The mean loss in such a subsequence is proportional to the integral 


(1 z,m) = [ s0)() oe" - 2 dp = 1-9’, 


where z is the statistician’s answer to m heads. Thus he has to find a function 
p’ = g(m, n) which gives to (x, m) the least value when substituted for z in the 
integral (1). Since the (n + 1) values p’ takes for m = 0, 1, --- , n are inde- 
pendent, S has to solve the problem J(z, m) = minimum separately for each m. 
This leads to a function g(m, n) that minimizes the mean loss in the whole se- 
quence Q. The problem is easily solved, for x is the root of the equation 
dI(x,m) / dx = 0. Calling it z,, we get immediately 


1 1 
(2) in = l f(p)p"" ap / [ f(p)p"¢" ” dp = g(m,n). 


Formula (2) is the clue to the problem: S has to declare p’ = g(m, n) in every 
game, g(m, n) being defined by the ratio (2). It is to be noted that the integral 
(1) is a nonnegative quadratic function of z, which implies that the root of J’ = 0 
is a perfect minimum of J: for any z different from z,, , we get J(z,m) > I(am , m). 

A nontrivial case is given by the Bayes hypothesis f(p) = 1. Formula (2) 
then leads immediately to 


m+ 1 
(3) g(m,n) = n+2 
as the best method for S. Laplace employed such a hypothesis and (3), and he 
has been rebuked for it by his successors. An example of another kind is the 
favorite method of many naturalists and other scientists, who put p’ = m/n. 
We now have to determine f(p) so as to satisfy (2) identically in m (n is fixed once 
for all) with g(m, n) replaced by m/n. Putting m = n we immediately get the 
condition 


(4) [ too dp = [ sor” dp. 


The functions integrated being nonnegative and p"*’ less than p” everywhere 
in (0, 1), (4) implies f(p) = 0 (almost everywhere), which contradicts the condi- 
tion fi f(p) dp = 1, essential for any density. Thus the estimate m/n does not 
correspond to any density f(p) whatever. It is a remarkable fact that some 
surgeons have reached the same conclusion without being mathematicians and 
without knowing how to improve the estimate [7]. 


3. The modern theory of games as codified in [5] and extended, for example, 
in [12], guarantees the existence of a solution of our problem. It proves, for such 
two-person games as are considered here, the existence of a density f(p) of a 
sequence {g,(m, n)} of guessing-methods, and of a number V (called the value 
of the game) such that 
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1° the statistician applying the guess p’ = g°” (m,n) in the jth round is sure to 
lose no more than V in the mean whatever the devil does, the mean being taken 
over the results of the whole sequence Q. 

2° the devil who mixes his coins in conformity to the frequency function f(p) 
is sure to win in the mean at least V regardless of what S does. The use of the 
frequency function f(p) by D is to be understood as the use of a sequence of coins 
such that the corresponding sequence {p} has a relative frequency f\ f(p) dp 
of terms belonging to the interval (u, v) whatever this interval may be 
(0 S u Sv S 1). It follows immediately from 1° and 2° that 

3° the “mixed strategy” defined by {g‘”} is the best for S in the minimax 
sense of “best.” 

It is also clear that V, the “value of the game,”’ is the fixed indemnity which, 
when paid by D to S before every choice of a coin, makes the play fair; such 
settlement would not influence our theory. 

The importance of the theorem of J. von Neumann leading (through its ex- 
tensions) to the statements mentioned lies, as far as our endeavor is concerned, 
in the assurance that it is worthwhile to seek an effective formula for the estimate 
p’. Nevertheless, our reader is neither supposed to know the theory of games nor 
the somewhat intricate terminology of the books quoted ((5] and [12]). Our 
argument will be independent of the general theorem, and it will yield in our 
special problem more than is promised by 1°, 2°, and 3°. The difference between 
the result of Sec. 1 and the objective of the following sections is also worth notic- 
ing: we could not guarantee to S a loss less than 25 cents in a single round but 
we can—as will be shown—reduce his mean loss in a sequence of rounds to a 
quantity V less than 25 cents by finding a minimax solution which admits no 
further improvement. 

It is necessary to explain why we are not entirely satisfied by the promise 
contained in 1°, 2°, and 3°. What we need is an estimate p’ = g(m, n) valid for 
the whole sequence Q of successive rounds; estimates varying from round to 
round would introduce an indeterminism of a dangerous kind which would make 
the scientist utter different judgments in identical situations: if gj(m, n) + 
g.(m, n) for the same couple m, n, he is compelled to estimate differently the 
efficacy of two antibiotics, although both of them gave m positive results in 
n experiments; he is compelled to such behavior because the first experiment was 
scheduled ‘‘no , 7”? and the other ‘‘ny , h.”” In Sec. 1 we have seen that no satis- 
factory estimate can be reached by making it an absolute constant; now we reject 
estimates which have the drawback of variability, though it is not universally 
agreed that such estimates are irrational. The case against them is forcibly put 
by R. A. Fisher, pp. 97-98 of [1]. 


4. Let us suppose now that D has chosen once for all a particular coin char- 
acterized by p, whereas S employs the particular function g(m, n) of (3) as his 
method of guessing, and let us call Z the mean gain of the devil. We get 
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(5) E= $ (”) pg” (» — ) 


 n+2 
_s(n\f.2_ 5. mt+1, (m+1Vy 
-E()[° pnt + (Bes) |. 


To compute E we make use of the trivial equality } > ("ona = 1 and of 


mad 


the following identities: 
(6) a t: ) mp" "=np, 2 (” ) m'p"g"" = nip’ + npg; 
mat \™M mad \™ 
they are, respectively, familiar formulae for the first and second moments of 
the binomial distribution. Eqs. (5) and (6) lead immediately to the simple result 


_ (— 4)pq+1 
(n+ 2% © 


Let us consider first the case n = 4. Then E ceases to depend on p and becomes 
equal to 1/36. This circumstance makes the statistician who adopted the method 
p’ = (m + 1)/6 independent of all diabolic cunning: whatever D does, whether 
he sticks to one coin or changes them as he likes, S will lose in the long run 1/36 
as his mean expenditure, never more nor less. It follows therefrom that any 
method is the best for D as an answer to the method p’ = (m + 1)/6. In par- 
ticular, the method defined by the density f(p) = 1 is such a relatively best 
method against the statistician’s rule p’ = (m + 1)/6. What is, on the other 
hand, the statistician’s adequate answer to the devil’s rule f(p) = 1? This answer 
has already been found in Sec. 2: g(m, n) = (m + 1)/(n + 2), which is equal 
in our special case n = 4 to (m + 1)/6. Thus we have two antagonistic methods 
for D and S, each of them being relatively best against the other. We can easily 
prove that the rule of guessing p’ = (m + 1)/6 is not only relatively best but 
also absolutely best, i.e., that it fulfils the minimax criteria for best strategies. 
The proof runs as follows: The rule p’ = (m + 1)/6 reduces the loss of S to 
1/36 whatever D does. On the other hand, the same rule is the best answer 
against D’s employing the system f(p) = 1. This last statement implies the im- 
possibility of S’s reducing his loss below 1/36 against the devil’s playing as men- 
tioned. Therefore no method gives a certainty to S of a loss below 1/36; as the 
rule above gives a guarantee of a loss equal to 1/36, no rule is better in the mini- 
max sense, Q.E.D. The devil’s rule f(p) = 1 is also absolutely best for D, a 
theorem which can also be proved almost immediately. Both proofs are superflu- 
ous for a reader knowing the general theorem about games that two methods 
relatively best against each other are also absolutely best for the opponents em- 
ploying them. The existence of such a pair of methods shows, in our terminology, 
that the game is closed. The result of their being constantly applied by both 
sides is a mean loss equal to the value of the game, which is 1/36 in our case. 


E 
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Thus we have solved our problem of the best estimate for n = 4. The solution is 
p’ = (m + 1)/6. 

Let us try to free ourselves from the restriction n = 4 by setting more gen- 
erally, 


m+a 

n+b 

and assuming the independence of a and b from m but not from n. To compute E 
in this case we have to write 


_-< n m a—m m+a\ 
(8) B= %("\pra (p - +8) 


instead of (5), and to repeat almost verbally the trivial computations of the 
special case a = 1, b = 2; they give now 


(7) p’ - g(m, n) = 


(9) p= et po (q = 1-p). 


The numerator np(1 — p) + (bp — a)’ ceases to depend on p just if we put 


a = ~/n/2, b = ~/n. These values when substituted in (7) for a and b yield the 
formula 


(10) p’ = g(m,n) = ma wae = G(m,n) 
as a guessing method for S. When substituted in (9), they give 
1 
~ &(/n + 1) 
as the mean loss of the statistician, whatever the sequence of coins employed by 
the devil. To follow the path traced in the special case n = 4, we have to find a 


density f(p) which satisfies (2) when we write for g(m, n) the function G defined 
by (10). Let us try to this effect 


(11) A 


f(p) = c(pq)*(s > —1) with I1/e = | (pq)* dp. 


The integrals in (2) now take the form 


1 1 
| Pre = me dp, c| pn(1 “ae ate dp, 


and their respective values are 
T(s + m+ 2) I(s +2 —m+ 1) Tis+m+1r(s+n—m+1) 
T(2s + n + 3) : I'(2s + n + 2) 
Their ratio is (s + m + 1)/(2s + n + 2), and condition (2) becomes 


(12) m+st+1_m+vV/n/2 
n+2s+2 n+<Sn- 
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To satisfy (12), we set s = +~/n/2 — 1 and get 


a sj. In) 
(13) f(p) = (pq)¥™ (TG/n /2))° 


as the density sought to close the game. Thus the method G defined by (10) is 
the solution of our problem. It is the best estimate of p because it minimizes the 
mean square error. The case n = 4 is obviously contained as a special case in 
(10); it is the only case in which the hypothesis of uniform distribution a priori is 
consistent, in a certain sense at least, with our theory. 

It is interesting to note that the classical rules—Bayes’ hypothesis of uniform 
prior distribution and the estimate m/n—can both be saved if (p — p’)*/pq is 
the amount to be paid by S to D. This is easily verified by writing a = 0, b = 0 
in (7) and dividing (9) by pg to compute the expected gain of D in the new game; 
it is 1/n and thus independent of p. On the other hand, if we put f(p) = 1 in 
(1) and replace (p — x)* by (p — z)*/pq to satisfy the new rules of the game, 
we are led by the same reasoning as employed for the old rule to the formula 


1 1 
in = I pr" ap / | Spr art dp on m/n, 
instead of (2). 


5. The method of S given by G holds the devil tightly by his horns: he can 
neither increase nor diminish his mean gain which is equal to A as defined by 
(11). A is the value of the game and, consequently, the.indemnity which, when 
paid in advance by D to S in every round, makes the whole game fair: this 
signifies that the mean balance of the sequence of rounds tends to zero with prob- 
ability 1 if this indemnity is agreed upon and both partners play as well as pos- 
sible. If S adheres to G the devil’s lack of skill does not alter the limit. There 
are, however, bad methods for D (i.e., bad densities f(p)); if S knew that D has 
chosen such a method, he could apply a method securing himself a positive mean 
balance. 

The expression 


(14) VA = 3(Vn + 1) 


may be called the mean error of the estimate (10). We emphasize that it is de- 
pendent exclusively on n. This enables the scientist to plan his experiments with 
a desired accuracy: he gets by an adequate choice of n the accuracy desired, 
neither more nor less, which is very important (e.g., the estimation of a lot by 
counting good and defective items in a sample of n has to avoid samples too 
great and too small: the first cost too much, the second give insufficient informa- 
tion). To appreciate this feature of the estimate (10) we may compare it with the 
“confidence interval’? method where the length of the interval, given the con- 
fidence level, is known only after the experiment, because it depends on both 
n and m. Finally, our formulae are extremely simple, and can easily be tabulated 
for practical purposes. 
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6. Our estimate G is free from the disadvantage of ‘‘mixed strategies” antici- 
pated at the end of Sec. 3. We may put the question of its uniqueness. Stanistaw 
Trybula has found the following simple proof [11] that there is no other estimate 
equivalent to G. It has already been proved here that there is no better estimate; 
we have the author’s consent to repeat here his proof of uniqueness. The proof 
that there is no other estimate equivalent to G is simple. It is stated, in more 
general form in [4], where the estimate was first announced. I happened to hear 
it first from Stanistaw Trybula who rediscovered it in connection with [11]. You 
have already seen that there is no better estimate; and now here is the proof of 
uniqueness. 

We have already stated in Sec. 2 that the solution (2) of the minimum problem 
for the integral (x, m) defined in (1) is unique: 


(15) I(z,m) > I(zm, m) for every zx different from 2» . 


Suppose that S adopts any method M, of the form g(m, n) different from G. 
Let the devil choose the method defined by (13). Since this density has been 
shown to give G when substituted in (2), we see by (15) that there is at least one 
subsequence G,, giving to D a greater gain in the mean than the gain he would 
obtain if S applied the method M: this subsequence is generated by the subscript 
m for which M differs from G(m, n); there is at least one such value m among 
0, 1, --- , n. For any values m, for which M equals G(m, n) the corresponding 
subsequences G,, give the same gain to D as he would gather if opposed by G(m, n). 
Thus S will suffer in the mean over all rounds played a loss greater than L, if 
he applies M and is opposed by (13) as the devil’s method. Thus M is not a 
minimax solution, which proves the uniqueness of our minimax solution G(m, n). 

The question of uniqueness of the devil’s best method has a different answer. 
It too has been found by Trybula [11], and it is negative. There are densities 
different from (13) that are equivalent to it as gambling methods in every respect. 
They are, however, not so simple. 

We have already seen that there does not exist a single coin po as a “rigid 
method” for D; the reason is the possibility of S’s choosing p’ = po if D’s method 
is to always use the same coin p» . Nevertheless, there are other methods, which 
correspond neither to densities nor to constants, to speak more generally, which 
have no distributions. We have omitted taking them into account; because our 
argument led first to a method G for S which made his mean loss independent of 
all methods of D, the distributionless methods included, and we had only to 
find one definite method for D which could never be better answered than by G. 
Thus we can admit all possible methods of D, having a distribution or not; the 
theory expounded remains valid without restrictions. As the devil represents 
nature, we have no reasons to jeopardize his freedom; it is an advantage for S, 
the human player, that our theory enables him to master the situation in the 
general case. 

In all cases, when the existence of limits is asserted, the assertions are true 
with probability one. There are other points where we have sinned against the 
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professional vocabulary, e.g., when speaking about the “‘mean error’’; it is rather 
a metaphor: we always think of the expression 


lim 1 > (p? — p'Y, 

Now LV jul 
For practical purposes this limit is a better measure for the error than any having 
the disadvantage of being dependent on m and of becoming 0 for m = 0 and for 
m = n. These disadvantages have shocked practicians and compelled them to 
leave the hard soil of sound statistics ({7], [9]). 

Our method may be criticized as leading to a biased estimate. This means, 

in plain words, that in the case of estimating the same quality p over and over, 
say r times, we get r values m; (j = 1, --- , r) and r estimates 


p’? = m; + Vn/2 
n+~Jn 


whose mean 1/r >-j_; p’” does not tend to p for r > . This is, however, an 
artificial situation. If we have different lots L; we are not entitled to suppose 
p” the same for all lots and, to be on the safe side, we must give separate esti- 
mates for each lot. If, however, we have to deal with the same lot r times, we 
will not compute its quality by taking the mean of the estimates but we will 
employ the formula (10) with m = 5-1 m;, writing 


"or nr 


The estimate p; has, obviously, the property that lim,.. p, = p. 

Our method is essentially based on the measure of the error given by (p — p’)’. 
Of course, this measure is by no means the only possible one. In some applica- 
tions the minimizing of the average error | p — p’ | would be more realistic; in 
some cases even some asymmetric function of the difference p — p’ could claim 
to be the best measure of the error in view of the economic consequences of a 
bad estimate which ought to be minimized. Notwithstanding this objection, we 
do not see any possibility of covering all the statistical applications by a single 
formula and must either abstain from giving any general rule or adopt the princi- 
ple of least squares as being by far the best known and most applied. Its simplicity 
is inherited by our formula; no tables are necessary to employ it for small n, 
and there is no distinction between small and large n to be taken care of. 


7. The comparison of our estimate (10) and that called the “confidence in- 
terval” [6] leads to the following remarks. 

The confidence interval is defined by a ‘‘confidence level’’ and by the postulate 
of being the shortest one, given m, n and the confidence level. It is not a direct 
estimate of the unknown p. In many cases, however, such a direct estimate is 
necessary ; let us think, for instance, of selling two kinds of some produce on the 
basis of a first experience which gave m and n — m buyers respectively for kinds 
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A and B; the producer is compelled to choose a definite p’ so that he may include 
in the next shipment the proper numbers of items A and B. Nevertheless we could 
consider our estimate as the middle point of an interval of the length 
1/ (Vn + 1). This length being the double of the expression (14), we may risk 
the conjecture that the confidence level for it is 0.6827 --- for n greater than, let 
us guess, 30. Thus our estimate would appear as a special case of the classical 
confidence interval: the case of confidence level 0.6827. This is, however, an 
illusion, because of the classical confidence interval’s having a length depending 
on m which is not true for the length (14). We have already pointed out in Sec. 5 
the importance of this independence. 

The remarks above lead to the problem of modifying the so-called confidence 
interval by making its length independent of m. To speak again in the language 
of the theory of games, it would be necessary to let the devil and the statistician 
fix n and an arbitrary ¢ (0 < ¢ S 1) and let them gamble, the devil choosing the 
coin to be tossed and the statistician placing an interval J of length ¢ on the 
real axis after having seen the result of n tossings. The stake would be $1 paid 
by D to S if p is covered by J and by S to D in the opposite case. The problem 
for S would consist in placing the middle-point P of J as well as possible. He 
would have to find a formula P(m, n; t) for the best location of P. This formula 
would be the statistician’s best method of playing and would lead eventually 
to a value of the game, V (n, t), independent of p. Solving the equation $V (n, t) = v 
we would get n = A(t, v); the function A(t, v) would answer the following problem: 
given the confidence level v and the length t, to determine the number n of experi- 
ments needed to include the unknown p in an interval of length ¢ with a confidence 
v. We have not tried to solve this problem; we mention it here to show distinctly 
the difference between the theory of Neyman’s confidence intervals and our line 
of approach. 

We pass now to another question; it is the last of the three examples spoken 
of in the introduction. 

Instead of the simple alternative, or dichotomy, for which the solution has 
already been described, we have here four possible results of tossing. Let us 
more generally suppose that there are k possible outcomes, k > 2. The coin 
must be replaced by a k-hedron, every face being painted differently from the 
others; D knows the respective probabilities p,(i = 1, 2,--- , k) for all faces; 
he has in his collection a model for every set {p;}; obviously }~*_, p; = 1. The 
result of n tossings with a definite k-hedron is a set {m;}; m; (i = 1, 2, ---, k) 
are the respective numbers of tossings resulting in the ith face up; obviously 

j-1m; = n. S has to estimate {p,;} knowing {m,}; his estimate {p;} has to 
minimize the expected value of 


(16) imi (Di — Di)’. 
He tries to do it by putting 


(17) 
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We now have, instead of (5), the expression 


n!} ni 


E= Pog rpg ES +. ge 


fm} ™,! M2 !-- 
_mt+aV, _m + a\'\ 
{(» ute) » ein (», me) [? 


the summation being extended over all systems of nonnegative integers m; with 
>in. m; = n. Let us denote by E, the part of E which is left after cancelling 
all squares in { } in (18) but the first. If we denote m, by m, p; by p, and 

tx2 Pi by q we get immediately the same expression for EZ; that we have called 
E in (8). This gives, in view of (9) 


(18) 


I 2 
(n aa b)? (npi(1 pi) + (bp; a) ). 
The same applies to each of the k parts EZ; of (18) and leads, eventually, to 
k k 
l 2 
= i= ‘ a . b —— 
(19) ae Ge op & np pi) + (bp; — ay} 


oe 1 or = gee ‘ 
a nc + (b n) 2 pi 2ab + any ; 


This expression becomes independent of the variables p; if and only if we put 
b = +~/n; it then takes the form 


(20) E n — 2av/n + ka’). 


Se 
(n + Vn) 

To determine a we introduce the condition 
(21) int Pi = 1, 
obviously needed in most applications. Thus we get from (17) with the value 
b= Vn 

k 

m +a n+ ka 

22 l= ——_- = ——_., 
(22) im n+ob n++S/n 
and, from (22), a = +~/n/k. We note with gratification that this value of a 
necessary to insure (21) is also just the value needed in (20) to minimize Z 
without regard for (21). The formulae (17) now become 

, ‘ k 

n+<Jn 

Putting a = +/n/k into (20) we get 


(¢ = 1,2,---,k). 


i ia Shes ee 
(n + Vn) ptt 


an expression independent of {p,}. 


(24) 
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To attain our objective by pursuing the analogy with the case k = 2, we must 
find a density on the hyperplane 
(25) ‘iap=1 OS p51;1 =1,2,---,h), 


for which the estimates (23) minimize the expected value of > tat (pi — ps)’. 
Let us try the density f(p: po --- px) = C(pi po --+ pe)’, where the constants C 
and s will be determined later. Supposing that the statistician knows that the 
frequencies of k-hedra employed successively by the devil are distributed ac- 
cording to this density (each point on the hyperplane (25) defines a k-hedron 
and the converse is also true), he will have to determine his guesses {z;}, after 
having observed the result {m;}, so.as to minimize the expected loss that is given 
by the positive quadratic form 


F(x, 22, eit » tx) 

(26) ' 

c—-__ f pes [ a -+ ped'pt = p*X (wi — 20" dw. 
M+ *** Me id pyte-++ ppl 


The integral in (26) is extended over the domain defined by (25) and dw is the 
element of the hyperplane (25). The minimum problem is equivalent to the set 
of equations dF /dxz; = 0 (i = 1, 2, --- , k) which are obviously 


(27) fe | 


- (pi Po eee Dr) Pi Pp: eee pr (ps oes 2;) dw os 0 (a = iL2 cee ,k) 


and the method of guessing defined by (23) will be proved best in the minimax 
sense if the set {z;} solving (27) can be identified with the set {p;} given by (23). 
This amounts to k conditions of which the first is 


ae m,+e+l mots o> eerte 


| aide / pip eee pe dw n + /n 


The numerator L and the denominator M in (28) can be brought into the forms 
y = Pom +8 + 2)0(m: + 8 + 1) +++ Tim +8 + 1) 
Tin + k(s + 1) + 1) 


mw = rom ts t+ Dm +s +) ++» Tim + s+ 1) 
Tn + k(s + 1)) 


respectively, if the classical identity [13] 


” I] + [jet ao = Tes 


(29) 
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is applied to both of them. (29) immediately yields 


L_ m+s+1 
(31) M n+ks +1) 
and the condition (28) leads, through (31), to 


(32) m+s+1 _ m+ Vn/k 
n+ k(s + 1) n+~+dSn’ 


which can be satisfied by setting s + 1 = +/n/k; since the same reasoning 
applies to all k conditions of which (28) is the first, all will be satisfied by the 
value chosen for s. As to the constant C it can now be found from the condition 


(33) c| sf pe pV dw = 1 (k > 2) 
Pi SOU Dit++++P_=1 


which gives 


I(V/n) 
34 Cs ————— 
(34) I(/n/k) 


and, finally, 


- Valk 

(35) F(/a/b (pr pr Pr) 

is the density sought. If it is employed by the devil in an infinite game where the 
payment (16) in every round is agreed upon, the statistician’s best answer is 
defined by (23), whether or not S is bound to respect the condition (21). This 
statement has just been proved; it enables us to apply once more the argument 
employed in the case k = 2 and to recognize the estimates (24) as the best in 
the minimax sense. The uniqueness of our solution follows by the reasoning 
already applied (for k = 2) in Sec. 6: as F in (26) shares with J in (1) the property 
of attaining its minimum at a single point, no modifications of the former 
reasoning are necessary. 


8. If we drop the condition (21) we can write p for p: , g for 1 — p and seek 
with S the best estimate of p,; alone by the theory of the dichotomous case. 
This leads immediately to the solution (10) which may be written adequately 


P1 iat (m, + Vn/2) | 
’ (n+~VSn) ’ 


we can, of course, replace the subscript 1 on both sides by i(i = 1, 2, --- , k). 
We will have for the total loss (16) the mean kA, where A is given by (11). On 
the other hand, the loss has been computed in (24) under condition (21) as 
being equal to 4[(k — 1)/kJA; since 4[(k — 1)/k] is less than k for k > 2, we 
arrive at the paradox that S can reduce his loss in the general case k > 2 by 
accepting the obligation (21). The explanation lies in the circumstance that the 
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loss A has been computed as minimax, taking into account the possibility of 
D’s playing his best method, which has been found to be the density c(pq)*. 
In the general case he can not apply the same method for every face of the 
k-hedron because he would have to choose a density f(pi, po, --*, De) that 
would have the form c,p;(1 — q,;) for i = 1, 2, --- , & simultaneously, which is 
impossible. Thus the plurality of faces hinders the devil from simultaneously 
playing all the alternative games contained in the general case as well as possible. 
This obstacle for D opens the way for S to play every alternative game in the 
general case better than he could do it in the simple alternative k = 2; the in- 
equality 4[(k — 1)/k] < k is the proof that the advantage thus gained by S 
is greater than the disadvantage that results from the condition (21). In fact 
we remarked just after (22) that this disadvantage is purely illusory. 


9. The method of this paper applies as well to other distributions as to the 
binomial. The simplest example is given by the Poisson distribution. The problem 
is to estimate the mean number of signals per unit time, having observed an 
arbitrary unit interval and having found & signals in it. To apply the theory of 
games, we must define the amount to be paid for an erroneous estimate. If, to 
mention an example of analytical interest, this amount is equal to (ec — c’)*/e, 
c being the true mean number and c’ the estimate, the problem is easily solved. 

Let us try the guess c’ = k. The expected loss for a given c is 


oO / .\2 k 

(36) ge > €—9 @. < = cM, — 2M; + M./c, 
km c k! 

the M; (¢ = 0, 1, 2) being the moments of order 7 of the Poisson distribution; 

this leads to 


(37) E=c-—2+Ww+ec)/c = 1, 


w being the variance of the random variable k, which in this instance is c. Thus 
the loss F does not depend on c. We now have to find a prior distribution which 
leads to the method c’ = k as the relatively best. We will show that the uniform 
distribution does so. To speak correctly, it will be the distribution of constant 
density T/(T° — 1) (T > 1) in (1/T, T) for T — «. We have to compute z, 
for a given k so as to minimize the expected loss given by 


T [’ ~ ¢(e — x) 
(38) T? — 1LJuyr Kile * 


x being the guess we are seeking. The condition for the minimum is evidently 


- T 
—c k —c k—1 
[ ettac=xf ee de, 
1/7 1/T 


and becomes, for T — «,k! = 2(k — 1)!,2 = k, Q.E.D. 

It is interesting to note that no minimax solution exists if (¢ — c’)’ is fixed 
as the amount to be paid by S to D. The reason for this insolubility is to be 
sought in the freedom of the devil to increase c as far as he likes, putting for 
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instance c; = j or c; = 10’ in the jth round. It is easily seen that any guess c’ 
far from k is worse for S than c’ = k; on the other hand the expected value of 
(c — k)’ is c, rising thus with c beyond every limit and increasing indefinitely 
the gain expected by D, whatever S does. The schedule of payments (ce — c’)’/c’ 
is also of practical interest, corresponding as it does to a penalty based on per- 
centage error. Here again it is easy to see that no minimax solution exists. 


10. Different objections to the new estimate have been raised: 

1° The loss-function (p — p’)’ has been criticized as arbitrary. We have pointed 
out in the introduction that it is the result of an extrapolation of the estimate 
Ep of the old school; such a procedure is suggested by the circumstance that the 
symbol E loses its direct applicability when the prior distribution of p is either 
unknown or nonexistent. 

2° The old estimate m/n has been compared with (10) as giving a better 
result, especially for n not too small. It has, however, been rejected not by 
mathematicians but by men of practice [7], who became aware of the paradox 
of error 0 for m = 0,m = n. 

3° To save the minimax method, the Bayes hypothesis f(p) = 1, and the esti- 
mate m/n, the loss-function (p — p’)*/pq has been proposed, but no examples 
have been adduced for which this loss function seems realistic. There is also 
some interest in the loss function (p — p’)*/(pqg)"—for which no minimax esti- 
mate exists. There are examples in which this function corresponds to the real 
situation for p near 0 or for p near 1, but it is difficult to find natural problems 
in which the function measures the damage caused by bad guessing throughout 
the whole interval 0 < p < 1. 

4° The concept of a mathematically trained and satanically malicious devil 
has been denounced as leading to unnecessary caution; the estimate (10) is 
accused of paying too high a tribute to nature, which is incapable of seeking 
out such sophisticated devices against the mortals as given by (13). This ob- 
jection may be answered as follows: 

(a) We are compelled to play a game against nature (“‘the devil’’) and we do 
apply the new theory of games. If we are not allowed to do so we must abandon 
the new theory of games altogether and return to the old phrasevlogy of ‘taking 
into account the psychology of the enemy.” It is exactly the indefiniteness of 
such advice which made the new theory of games the only efficient tool in such 
problems as pursuit at sea; we need not worry whether the helmsman of the other 
boat is awake or not—he can just as well be an automaton and can be countered 
by our automaton. 

(b) Let us drop the anthropomorphic idea of nature and admit that it acts 
blindly, without any strategy whatever. Such a view leads us naturally to de- 
prive the devil of his sequence of coins of various frequencies. But even in such 
a situation it is difficult to imagine a better method for S against D than the new 
formulae: they do apply—not as minimax solutions but because of their inde- 


pendence of the p;-sequence; they give a definite mean-square error, the only 
light in the total darkness. 
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5° Mr. A has been the first to investigate the blood-groups of a certain African 
tribe: he examined n’ men and found m’ with Rh-plus blood among them. Mr. 
B came second and examined another sample; his result was n” and m” respec- 
tively. Mr. B feels obliged to take into account the information contained in 
the paper published by A: he considers the result of his predecessor as furnishing 
a prior distribution and hesitates to apply (10) to his numbers. The sound 
procedure is, however, for B to publish his own numbers n” and m”, to quote the 
numbers of A, and to estimate the Rh fraction of the tribe by setting n = n’ + n”, 
m = m’ + m” in formula (10) with the remark: “making use of all available 


observations.”” This example illustrates the meaning of our arguments 4 (a) 
and 4°(b). 
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NON-PARAMETRIC EMPIRICAL BAYES PROCEDURES' 
By M. V. Jouns, Jr. 


Columbia University 


1. Introduction and summary. In the usual formulation of problems in statis- 
tical decision theory the probability distribution of the observations is assumed 
to be a member of some specified class of distribution functions. No a priori 
information is ordinarily assumed to exist concerning which member of this 
class is the true distribution of the observations although a priori probability 
measures defined over this class may be introduced as a technical device for 
generating complete classes of decision functions, minimax decision rules, etc. 
However, in some experimental situations it may be reasonable to suppose 
that such an a priori probability measure actually exists in the sense that the 
distributions of observations occurring in different experiments made under 
similar circumstances may be thought of as having been selected from a speci- 
fied class of distribution functions according to some probability law. 

Such an assumption seems particularly apt in the case where measurements 
are made on an individual selected according to some probability law (e.g., 
“at random’’) from a population and where it is desired to make inferences 
about some characteristic of the individual on the basis of these measurements. 
If the class of probability distributions of the measurements for all individuals 
in the population and the law of selection are known, an optimum Bayes de- 
cision procedure can then be found. In general, however, such information will 
not be available to the experimenter, but there may be observations available 
on individuals previously selected in the same way from the same population 
and, under certain circumstances, these prior observations may be used to ob- 
tain approximations to the optimum Bayes decision procedure. The possibility 
of using prior observations to approximate Bayes procedures was first estab- 
lished for certain estimation problems in [1] by H. Robbins who coined the term 
“empirical Bayes procedures” to describe such approximations. 

Robbins in [1] discusses the estimation, using a squared error loss function, 
of the value \ of a random variable A associated with a discrete valued obser- 
vation X whose conditional probability function, given A, is p(z| A), where 
p(x |) is known for each \ but where the (a priori) distribution of A is un- 
known. For several specific parametric families of discrete probability functions 
p(x|) Robbins shows that if prior independent observations X,, X:,---, 
X, , each having the same unconditional distribution as X are available, then 
an empirical Bayes estimator using X,, X:, --- , X, can be found which con- 
verges with probability 1 to the Bayes estimator as n increases, for any a priori 
distribution of A. 
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In Sec. 2 below a similar estimation problem is considered for the “non- 
parametric’ case where the class of (conditional) probability distributions of X 
is not restricted to a particular parametric family, but is instead the class of 
all probability functions assigning probability 1 to some specified denumer- 
able set of numbers. The quantity to be estimated is the value of a functional 
defined on this class of probability functions and it is assumed that there exists 
an unknown a priori probability measure defined on a suitable o-algebra of 
subsets of this class. For this case it is shown that under certain circumstances 
prior observations may be used to construct empirical Bayes estimators having 
the property that, as the number of prior observations increases, the risks of 
the empirical Bayes estimators converge to the risk of the Bayes estimator for 
any a priori probability measure provided that certain moments exist. The 
rate of convergence of these risks is also investigated for two special cases. 

In Sec. 3 the techniques of Sec. 2 are modified to apply to the case where 
the class of (conditional) distributions of X is the class of all absolutely con- 
tinuous distribution functions, and similar results are obtained. 

In Sec. 4 the results of the previous sections are used to obtain empirical 
Bayes solutions for certain two-decision problems of the hypothesis-testing 
type. 

Throughout this paper certain elementary properties of conditional expec- 
tations are used which are immediate consequences of results contained, for 
example, in Chapter VII of [2]. 


2. Estimation: the discrete case. For a specified denumerable set of numbers 
x = {x} let F = {F(x|w):weQ} be the class of all c.df.’s assigning prob- 
ability 1 to x, where 2 = {w} is an abstract indexing set. Let u be an a priori 
probability measure defined on a o-algebra @ of subsets of 2, and let Y be the 
Q-valued random variable which is the identity mapping of 2 onto itself. We 
may then define the random variables X, , X:,-+-- , X, so that they are con- 
ditionally independently and identically distributed with the common c.f. 
F(z | w) given that Y = w. Finally, for a given measurable function h(x) we de- 
fine the random variable A by 


(2.1) A= A(Y) = E(h(X) | Y), 


where X is a generic representative of the X,’s. We assume that the a priori 
probability measure space (2, @, u) is such that 


(2.2) Eh'(X) < @. 


Here A may be thought of as a functional defined on $ or, equivalently, as a 
function defined on 2. We might, for example, define h(x) = x so that the value 
of A(w) is the expected value of X given that the c.d.f. of X is F(x | w). Another 
possibility would be to let h(x) = 1 if x < c, and h(x) = 0 otherwise, so that 
A(w) = F(c| «). 
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Suppose that we wish to use the vector of observations 
X = (Xi, X2,--- , Xs) 


to obtain an estimate of the value A assumed by A, where the loss incurred 
when ¢ is the estimated value is 


(2.3) Lit, 4) = (t — 2)’. 

Then the risk involved in using any estimator ¢(z) is 
(2.4) R(y) = EL(y(X), 4) = Ele(X) — Al. 
Now from (2.1) amd (2.2) we have 


(2.5) EX = E{E*(h(X)| Y)} Ss Eh’(X) < @, 


so that R(¢) may be written 


R(¢) 
= E{|e'(X) — 2o(X)E(A| X) + E°(A| X)] + E(A*| X) — F(A] X)}. 


(2.6) 


The expression in square brackets is a perfect square and hence is non-negative 
and equals zero if and only if ¢(X) = H(A|X). Thus the Bayes estimator 
¢,(z) which minimizes R(¢) is given by 


(2.7) o.(Z) = E(A|X = 2) 


for all g = (a, 22, °+-, 2%) e€x* = {z:Prob(X = z) > 0}. The risk of the 
Bayes estimator is then 


2.8) R(y,) = EX’ — Eg,(X) < &. 


To obtain the Bayes estimator g, we must, of course, know the a priori prob- 
ability structure of the problem. Suppose now that this structure is unknown 
but that collateral information is available, the form of which is determined 
as follows: 

Let Y,, Y2,---, Yn be mutually independent Q-valued random variables 
each of which is independent of Y and has the same distribution as Y. Then 
let the additional information be in the form of vectors of observations X; = 
(Xa, Xa, +++, Xivgsi), 7 = 1, 2,---, nm where the X,’s are mutually inde- 
pendent and independent of X and where for each 7 the X,;’s are conditionally 
independent and identically distributed according to F(z | w;) given that Y; = 
w;. Here although the X,’s are independent of X and A, they nevertheless 
contain useful information since they possess the same a priori probability 


structure as X. Thus, if we let Xf? = (Xa, Xe,--+, Xie) then X!” and 


E(h(Xir41) | Ys) 
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have the same joint distribution as X and A so that 
E(h(Xsr41) | XP = 2) = E{E(Xeru) | Ye, XP) (XP = g} 
(2.9) E{E(h(Xir41) | Ys) | XP = 2} 
E(A| X = z) = ¢,(z), 
for x ¢ x*. This suggests the following empirical Bayes estimation procedure: 
Let %,q = 1, 2,-+-, m(z) represent the m(z) distinct vectors obtained 

by permuting the components 2; , 22, --- , 2, of g. Clearly for each gz we have 
i.<s m(Z) <= r!. Now we define the random functions M;(z), i = 1, 2, -++, n, 
and M,(z) by 

1, if there exists a g, 1 S q S m(z), 
(2.10) Miz) = such that X{” = zr, 

0, otherwise, 


and 


(2.11) M,(z) = > Mi(2). 


Then we define an empirical Bayes estimator ¢,(z) by 
{ 


|! ele) Xe0s), Melz) > 0 
(2.12) iain CCC : 
0, otherwise. 


In order to show that the risk involved in using ¢, approaches the risk for the 


Bayes estimator ¢, as n becomes large we first prove two lemmas. For n = 
1, 2, ---, let 


(2.13) P,(z) = Prob {M, > 0}, 


| le M,(z) > 0, 

(2.14) V.(z) = M,(z) 
| 0, otherwise, 

(2.15) (x) = EV,(2), 

and for z € x*, 

(2.16) O(z) = E(W (Xie) | XP = 2). 

Lemma 1. For any fixed g € x*, 
(2.17) Ee,(r) = ¢,(z)P.(z), 
and 


(2.18) Eg.(xz) = [0(z) — ¢i(z)lta(z) + ¢;(2)Px(z). 
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Proof. For notational simplicity we let H; = h(X;,-4;) and suppress the fixed 
argument z (¢ x*) in all functions defined above. Then letting 


U, _ (M,, M2, shed , M,) 
and noting that for each 7 the joint distribution of the X,,’s is invariant under 
permutations we have 


(2.19) E(M.H;\| U,) = E(M.H;| Mi = Me,, 
by (2.9), and 
(2.20) E(M(Hi | U,) = E(M¢Hi| M,) = M8. 
Also for i ¥ q, 
E(M.M.H.H, | U.) = E{M.HE(M.H,| H:, Mi, M,)|M;, M,} 
(2.21) E{M.HE(M.H,|M,)| Mi, M,} 
E(M.H;|M)E(M.H,|M,) = MMe. 


Hence 


(2.22) Bo, = BY >> E(M.H; | Us) i, > 0} Pa = esPa, 


My, *=1 


and 


Ey’, = Ba > E(MiH? | U.) | M, > oh P, 
M;, i=1 


(2.23) + EA SD BM M Halle | Ue)| Me > 0} Pe, 


Min *t#@ 


i. A 1 n n | 
- ob { | it, > o P, + ee a Som, 
M M, 


at isa 


M,> of ies 
Now since 


l1<ca< 1 
2.24 — M,M =l1- — , 
—_ Mi. at ; M, 


and 
(2.25) t, = EV, = EY | uM. > of P,, 
we have 


(2.26) Eg, = 6, + ¢i[Pa — &nl, 


and the proof of the lemma is complete. 
Lemma 2. For any xz € x*, 


(2.27) lim P,(z) = 1, 


nwo 
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and 
(2.28) lim &,(z) = 0. 


n-~o 


Proof. Let 
(2.29) p(z) = Prob {M,(z) = 1}. 


Then z ¢ x* implies p(x) > 0. Now for any fixed z, M,(z) is a binomial variable 
with parameters n and p(x). Hence for sufficiently large n, M,(z) will be arbi- 
trarily large with probability arbitrarily close to 1 for any z ¢ x*. This implies 
that for x ¢ x*, 

(2.30) lim P,(z) = Jim Prob {M,(z) > 0} = 1 


nw~eo n-o 


and 

(2.31) V,(z) — 0, in probability, 

as n — ©. Now (2.31), together with the fact that 0 < V,(z) < 1, implies 
(2.32) lim £,(z) = lim EV,(z) = 0, 


n-~o n-o 


for z ¢ x*, which completes the proof of the lemma. 
We now prove the following theorem: 


TxueoreM 1. Jf the a priori probability measure space (Q, @, wu) is such that 
(2.2) is satisfied, then 


(2.33) lim R(v,) = R(¢,). 


n-~o 


Proof. We first observe that 
(2.34) Ren) = Elyn(X) — Al = Eyn(X) — 2E[Ag,(X)] + Ea’, 


provided that all of the terms on the right are finite. Now since X and A are 
independent of X,, X2,---, Xn, we have 


(2.35) E(en(X) | X = 2) = E¢,(z), 
and 

E(Agn(X) | X =.z) = E{AE(y,.(X) | A, X) |X = z} 
(2.36) E{ AE(¢,(X) | X)| X = z} 


rae ¢u(r)E¢,(z), 


for all z €x*. Hence by (2.17) and (2.18) of Lemma 1 together with (2.34), 
(2.35) and (2.36) we have 


(2.37) R(gn) = EA? + El0(X)é(X)] — Eles(X)(Px(X) + &(X))I. 


Now by (2.27) and (2.28) of Lemma 2, for all z € x*, 
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(2.38) lim 6(z)E.(z) = 0, 
and 
(2.39) lim ¢i(e)(Pa(e) + &9(@)) = 4(@). 


Also since 0 S &,(z) S land 0O S P,(z) S 1, we have 


(2.40) |o(z)éa(z)| S (z), 
and 


(2.41) lei(z)(Pa(z) + &a(z))| S Bwh(z), 
and furthermore since (2.2) is satisfied, 


(2.42) E0(X) = Eh(X) < @, 

and 

(2.43) Eg}(X) < @. 

Hence by the Lebesgue Dominated Convergence Theorem we may assert 
(2.44) lim Rv.) = EA* — Eo,(X) = R(,), 


which is the desired result. 

This result is “non-parametric” in the sense that we have assumed that the 
unknown a priori probability measure is defined over the class $ of all c.d-f.’s 
assigning probability 1 to the set x. If we are willing to assume that some specific 
parametric subclass of $ is assigned a priori probability 1 then we may be able 
to find empirical Bayes estimators such as those discussed by Robbins in [1] 
which (presumably) are more efficient than (2.12) for such cases. 

It seems likely that the empirical Bayes estimator ¢, given by (2.12) is rela- 
tively inefficient when n is small or r is large relative to n, since, in this case, 
M,,(X) is small with high probability so that relatively few of the X,’s contribute 
useful information to ¢, . This difficulty may be offset to some extent by replacing 
¢, by an estimate of A based on the value of X when M,,(X) is small. Thus, for 
example, we may define a modified empirical Bayes estimator ¢ for fixed c = 0 


¢n(2), M,,(z) > ¢, 
gn (z) = 


> 2 h(z;), M,(z) Sc. 


It is not difficult to verify that ¢{ has the same asymptotic properties as ¢, 
and that for small n, R(vS”) tends to be smaller than R(y,) except when the dis- 


tribution of A is concentrated near zero. 
If the vectors of prior observations X; are of the form X; = (Xa, Xe, --- 
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Xir4e;) Where k; = 1,71 = 1, 2, --- , nm, we may make use of the additional in- 
formation available when k; > 1 by substituting M,(z)(w,/k,) > h(X;,,;) 
for M.(z)h(Xi,r41) and >?) wM (xz) for M,(z) in (2.11), where w; , w2, --+ , Wn 
are positive numerical weights depending on the k,’s. It can be shown by argu- 
ments similar to those of Theorem 1 that whenever the w,’s are uniformly 
bounded above and below by positive constants then the risk of the resulting 
estimator approaches R(y,) as n becomes large. However, in general when the 
k,’s are not all the same the optimum choice of w, , w2, «++ , W, depends on the 
unknown a priori probability structure and hence cannot be determined. 

In practice it may happen that the numbers of components in the X,’s and in 
& are all the same (= r + 1, say) so that in order to use the estimator ¢, given 
by (2.12) we must discard one of the components of X. This seems undesirable 
and suggests the use of the modified empirical Bayes estimator 


1 r+1 w 

2.46 >.(z) = —— A(z”), 

(2.46) bale) = 5 Deal) 

where z" -_ (%, Ta, °** » Ljtry Vjtiy***y Lr4i), J _ 1, 2, godcn e+ and 
¢n is given by (2.12). To evaluate the performance of ¢,(z) we compare its risk 
with that of ¢,(z) which uses only r components of X, as follows: For all n, 


R(én) = EA” + Egi(X) — 2E(AG,(X)] 


EA? + aj Ee?(X) 


+ —— Elo(X”)e.(X”)] — 2E[Ag.(X”)] 
r+1 


Ren) — sca Ele(X) — on(X™)7 


R(¢.), 


with equality holding only in the degenerate case where X°” = X® with prob- 
ability 1. By arguments similar to those of Theorem 1 it may be shown that 
(2.48) lim R(%,) = R(¢,) S R,), 

where R(¢,) is the risk of the estimator 


r+1 


1 ma | yd) — Ld 
(2.49) e(z) = — 2, E(A|X® = 2°), 
and R(y,) is the risk of the ordinary Bayes estimator ¢,(2) = E(A| X° = 
a”) based on only r of the r + 1 components of X. The inequality in (2.48) 
will again be strict except in degenerate cases. 
It would be interesting to compute the value of R(g,) for various values of n 
for some specific a priori probability distribution in order to obtain some idea 
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of the rate at which R(¢,) converges to R(y,). Unfortunately, such computations 
are extremely lengthy even in the simplest non-trivial cases. However, reasonably 
good upper and lower bounds for R(g,) may be computed without much diffi- 
culty by using simple bounds for &,(r) obtained as follows: 


(2.50) t.(z) = EV,(z) = > tb; n, p(z)) 


where b(s; n, p(z)) is the probability of s successes occurring in n independent 
trials where the probability of success at each trial is p(x) = Prob {M,(z) = 1}. 
Also 


n 1 ' = 1 _ 
2; + OCs n, p(z)) = warp 2 +1;n+1, p(z)) 


“t 1 
(n+ 1)p) 


(2.51) 
{1 — [1 + np(z)]l1 — p(z)J"}, 


for g ¢ x*. Hence, letting 


1 
(n + 1)p(z) 
and noting that 1/(s + 1) < 1/s S 2/(s + 1) < 1 for all s = 1, we have 
(2.53) ba(z) < &(%) S 2b,() = 1, 
for x € x*. 

Suppose now that we wish to estimate the expected value of a non-negative 
integer-valued random variable X (i.e., we set h(x) = zand x = {0, 1, 2, ---}), 
and suppose that r = 1 so that XY = X and X; = (Xa, Xw)i = 1, 2,---,n. 
For this problem we may compute the upper bounds based on (2.53) for the risks 
of the estimators ¢, and ¢<” given respectively by (2.12) and (2.45) withe = 0, 


for the particular a priori probability measure wu which assigns probability 1 to 
the family of Poisson c.d.f.’s 


(2.52) b,.(z) = {1 — [1 + np(z)][1 — p(z)]"}, 


(z} 

(2.54) F(z|d) = ne z20, rA>0, 
te 6: 

and which induces the [-distribution 


a» 


B 
(2.55) GA) = I 7H) we du; a,B>O0, 


as the c.d.f. of A. In Table I below we compare the upper and lower bounds for 
R(g,) and R(ys) with the risk R(y,) of the Bayes estimator ¢, and with the 
risk R(x) of the classical estimator xz. These quantities are computed for six 
values of n, fora = 2 and 6 = 10 (i.e., HA = 5 and VarA = 2.5). In Table II 
we compare the same quantities for the case where yu assigns a priori probability 
1 to the particular member of the family of c.d.f.’s (2.54) for which AX = Ax = 5 
(i.e., A = Xo with probability 1). 
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TABLE I 


R(yn) RW | 


10.70 | 12.51 | 5.21 

| 8.82 4.43 
5.54 3.50 
3.87 | 2.79 
om” | £22 
| 2.34 2.05 


| 
| 
Lower Bound | Upper Bound | Lower Bound Upper Bound | 
ee ee 





TaB.e II 
| 
R(gn) | Re?) 
ns Ek ala mea OR 
Lower Bound | Upper Bound Lower Bound | Upper Bound 


6.06 i 3.68 : | 
2.94 | y 2.57 y 

1.46 ; | 1.57 q 
~ ae J | 0.91 ‘ 
0.38 | 0.51 

0.19 | : 0.27 





3. Estimation: the continuous case. In this section we extend the methods 
of Sec. 2 to cover the case where the observed X’s possess absolutely continu- 
ous distribution functions. As before these results will be “non-parametric” in 
the sense that the unknown a priori probability measure is assumed to be de- 
fined over the class of all absolutely continuous c.df.’s subject only to some 
conditions on the existence of moments. 

Let § = {F(x | w):w e€ Q} be the collection of all absolutely continuous c.d.f.’s 
where 2 = {w} is an abstract indexing set, and let (Q, @, u) be an a priori prob- 
ability measure space where @ is a o-algebra of subsets of 2. Then there exists a 
function f(u | w) defined on (reals) X Q such that 


(3.1) F(z|w) = | flu|w) du, 
for each w ¢ 2. We assume that (©, @) is such that the function f(u | w) is a 
measurable function on the product space (reals) X 2. 

Let Y = Y(w) be the 2-valued random variable which is the identity mapping 
of 2 onto itself. Let the (real) random variables X, , X:, --- , X, be conditionally 


independent and identically distributed according to F(z | w) given that Y = a, 


and let X = (X,, X2,---, X,). Then the unconditional joint c.df. of Xi, 
X,,--- , X, will be 


Plas, «+, 2,) = F@) = [ TL Fa;|«) du 


“Q j=l 
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2j 
-/ Ir |" flus | co) du; du 
I j= 


-["- = 3 fein, + ° +5 te) des +++ dite, 


where 


(3.3) flu, +--+, 2) = f(z) = [Tse\0) du. 


Thus f(z) is the joint unconditional probability density function of X,, --- , X,. 
Now for a given measurable function h(x) we define the random variable A by 


(3.4) A = A(Y) = E(h(X)| Y), 
where X is a generic representative of the X,’s. As in Sec. 2 we assume that 
(3.5) Eh'(X) < @, 


which implies EA” < and hence the existence of all conditional expectations 
with which we will be concerned. Now if we wish to estimate the value of A 
using X where the risk is the expected squared error, then, as before, the essen- 
tially unique Bayes estimator is 


(3.6) vz) = E(A| X = 


(In this section when conditional expectations are regarded as functions of the 
values of random variables it will be understood that we mean the essentially 
unique Borel measurable version which is set equal to zero whenever arbitrariness 
is possible on sets of positive Lebesgue measure.) 

As in Sec. 2 we introduce random variables Y, , --- , Y, independent of each 
other and of Y such that each has the same distribution as Y. We also introduce 
the random vectors of prior observations X; = (Xa, ---, Xi+ras), ¢ = 1, 2, 

, n, where the X,’s are independent of each other and of X and where for 
each 7 the X;;’s are conditionally independent and identically distributed accord- 
ing to F(z | w,) given that Y; = w; . As before we let X!? = (Xa » Kea, **4Ke), 
i = 1, 2, --- , n, and we note that for each 7 


(3.7) E(W(Xiru) | Xf? = x) = E(A|X = 2) = y,(z). 


In order to make use of the results of Sec. 2 we must discretize the X’s in some 
way. To this end we consider the double sequence of half-open intervals 


‘9 


n'- b/r j 


(as) 1 =|. @+e) pio anae.--; nw ha: 
ni! 


where c > 0 and 0 < 6 < 1. For each n we partition r-dimensional euclidean 
space into a countable sequence of non-overlapping hypercubes C$”, 7 = 1, 2, 
, of the form 


(3.9) CP = 9 xT) x --- xa j= 1,2,--- 


ti tej try? 
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where the ¢;;’s are suitably chosen integers. Then for each n and each r-compo- 
nent numerical vector z = (x, t2, -** , 2) we let C’”(z) be the unique member 
of the sequence (3.9) containing zg. As before, we designate by z,.), q = 1, 2, 

- , m(z), m(z) = 1, the distinct vectors obtained by permuting the compo- 


nents of z. Then proceeding by analogy with Sec. 2 we define the random func- 
tions 


1, if there exists a g, 1 S q S m(z), 
(3.10) M{"(z) = such that X{” ¢ C(x), 
0, otherwise, 


fori = 1, 2,---,m, and 


(3.11) Mz) = > M{"(z), 


and finally we define the empirical Bayes estimator y,(z) by 


(3.12) ¥.(z) = a > Ms” (z)A( Xi v4), M’(z) > 0, 


(0, otherwise. 


Before we can show that lim,;. R(W,) = R(y,) we must prove three prelimi- 
nary lemmas. 
We first let 


(3.13) ¢”(z) = E(h(X1241) | Mi”(z) = 1), 1, 2, °° 
and 
(3.14) a (xz) = E(h'(Xi.41) | Mi"(z) = 0), n=1,2,---. 


For each n, ¢‘”(x) and 6” (z) are analogous respectively to ¢,(z) and 6(z) of 
Sec. 2. We now prove the following lemma: 
Lemma 3. For almost all x such that f(x) > 0, 


(3.15) lim g(x) = E(A(Xir41) | Xi” = z) = v,(z), 
and 
(3.16) lim 6 (z) = E(h(X1r41) | Xi” = z). 


rw~eo 


Proof. Let the joint density function for the r + 1 components of X; (for any 
fixed 7) be written as 


r+1 


(317) flay tras) = fleas aay +++, tous) = | TT Ses 0) dy. 


Then since f(z, tri1) = f(z , Zrii), 9 = 1, 2, -++ , m(z), we have for almost 
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all ¢ 
|. f ncrpty, 0) ao ay 
c(")(z) 4-0 


| flu) du 
“ c(*) (2) 


provided the denominator is not zero. (If the denominator is zero, we conven- 
tionally set ¢‘”(z) = 0). Now for each z, C‘”(z), n = 1, 2, --- , is a sequence 
of hypercubes converging regularly to the point z. Hence for almost all z 


. 1 m . 
(3.19) lim Volume (C™(a)) Le [ h(v)f(u, v) dv du = ik h(v)f(z, v) dv, 


by a well-known theorem on the differentiation of multiple Lebesgue integrals. 
A similar limit holds for the denominator of (3.18) so that 


(3.18) ¢e” (x) = 


[ h(v)f(z, v) do 

f(@) 
E(h(X1 541) | Xi” = 2) = v,@), 
for almost all z such that f(z) > 0. Similarly 


(3.20) lim ¢’(z) = 


[ [RF 0) do dy 
Hen 6? (g) ae Ban So terroir 


f(u) du 


c(*) (z) 


[ h'(v)f(z, v) dv 
: f@) 
= E(h'(Xis41) | Xi” = 2), 
for almost all ¢ such that f(z) > 0, so that (3.15) and (3.16) are verified. 
For n = 1, 2, ---, let 


M(z) > 0, 


1 

(3.22) V™(z) = | M(z)’ 
(0, otherwise, 

(3.23) g™(z) = EV (z), 


and 


(3.24) P™(z) = Prob {M(z) > 0}. 


For each n, V‘(z), ¢"(z), and P(x) are analogous respectively to V,(z), 
(x), and P,(z) of Sec. 2. 
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Lemma 4. For almost all x such that f(z) > 0 
(3.25) lim (zx) = 0, 


n~e2 


and 


(3.26) lim P™ (zx) = 1. 


ne 


Proof. Let 
(3.27) p@)={ fw) dw. 
/ c(")(z) 


We have remarked that for almost all z 


, ; 1 
(3.28) nm Volume (C™%(@)) - 


By (3.8) and (3.9) we have 


[ f(u) du = f(z). 
c(™) (z) 


(3.29) Volume (C(z)) = ra 


—§? 


so that whenever f(z) > 0 we may write 
(3.30) p” (z) = f(z) (1 + «¢,(z)), 
n 


where limy. €n(z) = 0 for almost.all z. 
Referring to the upper bound (2.53) obtained for é,(z) in Sec. 2 and noting 
that m(zr)p™(z) plays the same role as p(z) and that m(z) = 1, we see that 


; 2 2 
Q° gM(n) < “ - : 
for all x such that f(x) > 0. Hence 


(3.32) lim (zx) = 0, 


for almost all g such that f(z) > 0 and (3.25) is verified. Now 
(3.33) P™ (z) = Prob {M™(z) > 0} = 1 — [1 — pz)", 
and since for allu < 1,e€ “ = 1 — u 2 O, we may write 

(3.34) P™(g) > 1 — "7, 

Hence for almost all x such that f(x) > 0, 


(3.35) lim inf P™(z) = 1 — lim eo" POOH] = 1, 


nwo non 


which implies (3.26). 
We now prove a simple convergence lemma. 
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Lemma 5. If (S, @, v) is a measure space, {f,} and {g,} are sequences of non- 
negative integrable functions, f is an integrable function and g is a function such that 


nwo 


(i) limf, = f, ae.; lim Jn = J, %€.; 
(3.36) (ii) gn Sf, all n; 
(iii) lim sup | t ds | sa»; 
then g is integrable and 
(3.37) lim | ona» = [ ga. 
Proof. By (i), (ii) and Fatou’s Lemma, g is integrable, 
(3.38) lim inf [ 9. dv = | g dr, 


now 


and (noting (iii)), 


(3.39) lim | t dy = [ sa. 


Furthermore, 


lim sup | (. — 9.) dv < lim sup [ f, dv — lim int | gn dy 


nwo 


(3.40) 
[a-o, 


and by (i) and (ii), lim,.« (f, — gn) = f — g and (f, — gn) = 0, so that apply- 
ing Fatou’s Lemma again we obtain 


(3.41) lim | (Ff, — gs) dy = [ta | ow. 


no 


The desired result then follows from (3.39) and (3.41). 
Turorem 3. If the a priori probability measure space (Q, @, wu) is such that 
(3.5) is satisfied, then 


(3.42) lim R(yn) = Ry,). 


nono 


Proof. Since X and A are independent of X;, X:,--- , Xn, we have (as in 
Sec. 2) 


(3.43) EWi(X) |X = z) = Eyi(z), 


and 


(3.44) E(Ayn(X) | X¥ = z) = v,(z)Eya(z), 
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for almost all z. Now applying Lemma 1 of Sec. 2 to ¥,(z) (mutatis mutandis) 
we obtain 
(3.45) j Eve(z) = o(z)P™(2), 
and 
(3.46) Evn(z) = (0 (2) — eae (z) +o" (Z)P (2), 
for almostYall ¢."Hence 
(3.47) Ey. (X) = Elo (X)e(X)] — Ble” (Xe (X)] + Ble" (X)P (Xp 
and 
(3.48) E{Ay.(X)] = Ely,(X)e™ (X)P™ (X)). 
Now in terms of the hypercubes C$”, 7 = 1, 2, --- , defined by (3.9) we may 


write 
2 = | [ese / h(v)f(u, v) dv au] 
(3.49) En) = HL Le) 
ony S(¥) du 


j=l 
i 


| f h(v)f(u, v) aw) 
— du 


ae oe 


y | 
= (n) 
j=l c; 


EBV(X) = / 


where it is understood that the ratios appearing in these expressions are to be 
replaced by zero whenever their denominators vanish. Furthermore, for each 


j we have 
; lf h(v)f(u, v) av) 


— du, 


(3.51) bond: h(v)f(u, v)dv au) s con SW) du- be a 


by the Schwartz inequality. Hence 
(3.52) Ee””(X) < Eyi(X), 
and since 0 S P(r) € 1, 


(3.53) Ele” (X)P™(X)] S EWi(X), 
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Now by (3.15) of Lemma 3 and (3.26) of Lemma 4 
(3.54) lim o"(z)P(z) = ¥i(z), 


n~wn 


for almost all gz such that f(r) > 0. Hence by Lemma 5° (with i. = 
¢'” (x)P™ (z)) we have 


(3.55) lim Ely” (X)P™(X)] = Ey3(X). 


nw 


Now 


(3.56) Ee” (X) _ Eh? (X1 741) - E{ E(W(Xi,141) | xi”)}, 
for all n, and by (3.16) of Lemma 3 


(3.57) lim 6 (z) = E(hW(X1r4:) |X” = 2), 


n~ew 


for almost all ¢ such that f(z) > 0. Also, 0 < ¢”(z) S 1 and by (3.25) of 
Lemma 4 


(3.58) lim ¢” (x) = 0, 


n-o 


for almost all ¢ such that f(z) > 0, so that by Lemma 5 (with f, = 06”(z) and 
g. = 6 (ze (z)) we have 


(3.59) lim Efe” (X)e” (X)] = 0. 


Similarly, in view of (3.16) of Lemma 3 and (3.52), 


(3.60) lim Ele (Xe (X)] = 0. 


Hence by (3.47), (3.55), (3.59), and (3.60) 
(3.61) lim Ey? (X) = By2(X). 


Now for any fixed n and j the functions ¢(z) and P“(z) are constant for 
all z e C}” and we may designate their values by ¢}” and P§” respectively. 
Then by (3.48) we have for each n 


EfAy,.(X)] = Ely,(X)e(X)P™ (X)] 


> es? P$” won Val@ f(a) de 


j=l 


<i 


= > of Ps” fini h(v) f(z, v) dv dx 
J i a2 


j=1 


7 Doss” ‘ so F@) dy 


j=l 


= Ele(X)P°(X)), 
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so that by (3.55) 
(3.63) lim E[Av.(X)] = Ey3(X). 


nwo 


Now 


(3.64) Rv») = EA* + EBy.(X) — 2E[Ay.(X)), 
so that by (3.61) and (3.63) we have 
lim R(x) = EA* + lim Ey,(X) — 2 lim EfAyn(X)] 
(3.65) noo nwo nwo 
EX’ — Ey;(X) = RW,), 

which was to be proved. 

The estimation procedure introduced in this section contains an element of 
arbitrariness arising from the fact that the definition of the sequence of intervals 
{7{"} involves the two constants c and 5 whose values must be specified. The 


problem of the proper choice of c and 6 will not, however, be considered further 
here. 


The remarks made in Sec. 2 concerning various modifications of the estimator 
¢n apply as well to the analogous modifications of the estimator y, . 


4. Hypothesis testing. The empirical Bayes estimation procedures introduced 
in the preceding sections may be applied to certain two-decision problems of 
the hypothesis-testing type. This is illustrated by the following two examples: 

Example 1. (One-sided alternatives): Suppose that we wish to test a hypothe- 


sis about the value \ of the random variable A associated with the vector of 
observations X. In particular, suppose that we wish to test the hypothesis Ho:\ < 
a versus the alternative hypothesis H,:\ = a. Let Ao represent the action of 
accepting Hy and let A; represent the action of accepting H,. Then we may 
define a loss function L as follows: 


max (0, A — a), } = 0, 
(4.1) L(A;,) = 
—min (0, \ — a), += 1. 

In the decision theoretic framework this loss function is certainly no less reason- 
able than the classical zero-one loss function usually postulated for hypothesis- 
testing problems. Now for any decision function 6(z) = Prob {A:| X= z} = 
probability of rejecting Hy) when gz is observed, the risk involved in using 6 is 
given by 
R(S) = E{s(X)L(Ai, A)} + E{[l — 6(X)L(Ao, A)} 

= EL(Ao, A) — E{6(X)[L(Ao, A) — L(A1, A)]} 

= EL(Ay, A) — E{s(X)[A — a}} 


EL(Ao, A) — E{a(X)[E(A | X) — al}. 


4.2) 
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Hence the Bayes decision function 6,(z) minimizing R(6) is 


fi E(A|X =z) >a, 


(4.3) 5,(z) = 

lo, otherwise. 
Now we have seen in the previous sections that when the a priori probability 
measure (and hence the joint distribution of A and X) is unknown we may still 


be able, under certain circumstances, to find an empirical Ba~es estimator ¢,(z) 
based on prior independent observations, such that 


(4.4) lim Ely.(z) — E(A| X = z)f = 0, 


noe 


for all z in some set S which is assigned probability 1 under the distribution of 
X. Now (4.4) implies that asn — ~, 


(4.5) ¢n(z) > E(A|X = 2), in probability, 


for all g ¢ S, so that if we define the empirical Bayes decision function 6,(z) by 


{i ¢n(Z) > a, 
(4.6) 6.(z) = 


0, otherwise, 
we will have 


(4.7) lim £6,(z) = lim Prob {¢,(z) > a} = 6,(z), 


n-2o n~o 


for all g « S with the possible exception of values of g for which E(A | X = zg) = 
a. Hence 


lim E{5,(X)[A — a] | X = x} = lim [H6,(z)[E(A | X = z) — al 
(4.8) no n--~o 


= 6,(x)[E(A | X = z) — al, 


for all g e S. Also, since 0 S 4,(z) S 1 for all values of z and n, we have 


(4.9) |[Z6,(z)[Z(A | X = zx) — al] S |R(A|X = z)| + Ia, all z, n. 


Hence by (4.8), (4.9) and the Lebesgue Dominated Convergence Theorem we 
have 


(4.10) lim R(8,) = R(6,), 

whenever the a priori probability measure is such that (4.5) holds and E|A| < . 
Example 2. (Two-sided alternatives): Suppose now that we wish to test the 

hypothesis Ho: ¢ (a — b, a+ 6b), b > 0, versus the alternative hypothesis 

H7?:\ 2 (a — b,a + 6), where, as before, \ is a value of the random variable 
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A which is associated with the vector of observations X. Let the loss function 


L* be defined by 
max (0, [(A — a)* — B’)), | =z QO, 
(4.11) L*(A;, d) = 
—min (0, [(A — a)’ — b’)), += 1 
where, as before, A; represents the action of accepting the hypothesis HT, i = 


0, 1. The graph of L* is shown in Fig. 1. For any decision function 6(z) = Prob 
{A,|X = z}, the risk is 


R(6) = EL* (Ao, A) — E{&(X)[L* (Ao, A) — L* (Ai, A)]} 
(4.12) = EL* (Ao, A) — E{a(X)[(A — a)’ — b*}} 
= EL* (Ao, A) — E{5(X)[E(A* | X) — 2aE(A| X) + a’ — B'}}. 
Hence the Bayes decision function 6; (z) minimizing R(8) is given by 


1, E(A*|X =z) — 2k(A|X = 7) > vB - a’, 
(4.13) a(x) = 
0, otherwise. 


’ 


Now if the a priori probability measure is not known we may still be able, under 
certain circumstances, to find empirical Bayes estimators g{?(z) and 9 (z) 
based on prior independent observations, such that as n — ~, 

(4.14) eo (z) — E(A| X = 2), in probability, 

and 


(4.15) eo (xz) ~ E(A*|X = 2), _ in probability, 


for all  « S where S, as before, is assigned probability 1 under the distribution 
of X. Then if we define the empirical Bayes decision function 6%(z) by 


1, oS (z) — 2ayp(z) > B — a’, 
(4.16) b3(z) = 


0, otherwise, 
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we have (by the same argument as in Example 1), 
(4.17) lim R(8z) = R(8t), 


for any a priori probability measure such that EA* < o and (4.14) and (4.15) 
hold. 

The existence of empirical Bayes estimators satisfying (4.14) follows directly 
(as in Example 1) from the results of the previous sections. We remark that if 
we assume that Eh‘(X) < @ in the cases treated in Secs. 2 and 3, then an em- 
pirical Bayes estimator satisfying (4.15) can be obtained whenever the number of 
components in the vector X; exceeds the number in X by at least 2 for all 7. 
To see this we observe that 


(4.18) E(R(X1 41) A(X r 42) xy = z) = E(’ | xX = z), 
so that (in the notation of Sec. 2) if we let 
1 S ss 
M,(z) 2 M Az)R(X ings) A(X i402), M,(z) > 0, 


(4.19) ghz) = 


LO, otherwise, 


we can show (by arguments paralleling those of Sec. 2) that if Eh‘(X) < then 
(4.20) lim Elyx’(z) — E(a*| X = z)f = 0, 


for all g e¢ S, which implies (4.15). 


5. General remarks. The methods of this paper clearly may be modified to 
apply to compound Bayes decision problems where the component problems 
are of one of the types considered above and where the compound risk is the 
average of the component risks. Robbins has conjectured in [3] that empirical 
Bayes solutions of such compound problems will often lead to asymptotically 
subminimax solutions for the corresponding compound decision problems where 
no a priori probability measure is assumed to exist. We may surmise therefore 
that suitable modifications of the techniques given here are applicable to such 
problems. 
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ON QUEUES WITH POISSON ARRIVALS 


By V. E. Bengs 
Bell Telephone Laboratories, Incorporated 


1. Summary. The system to be studied consists of a service unit and a queue 
of customers waiting to be served. Service-times of customers are independent, 
nonnegative variates with the common distribution B(v) having a finite first 
moment b, . Customers arrive in a Poisson process (see Feller [4], p. 364) of in- 
tensity \; they form a queue and are served in order of arrival, with no defections 
from the queue. For previous work on this queueing system see for instance 
Pollaczek [11], Khintchine [9], Lindley [10], Kendall [7], [8], Smith [12], Bailey 
[1], and Tak4es [14]. 

Let W(t) be the time a customer would have to wait if he arrived at ¢. The 
forward Kolmogorov equation for the distribution of W(¢) is solved in principle 
by the use of Laplace integrals, and E{exp{—sW/(t)}} is determined in terms of 
W(0) and the root of a possibly transcendental equation. It is shown that any 
analytic function of the root can be expanded in Lagrange’s series, which pro- 
vides a way of actually computing the transition probabilities of the process. 
Let z be the first zero of W(t). A series for E{exp{—rz}} is obtained, and it is 
proved that pr{z < ©} = 1 if and only if Ab; S 1. From a functional relation 
between E{W(t)} and pr{W(t) = 0} the covariance function R of W(t) is deter- 
mined. If the service-time distribution B(v) has a finite third moment, then R 
is absolutely integrable, and the spectral distribution of W(t) is absolutely con- 
tinuous. 


2. The distributions of waiting-time and busy-time. Let W(t) be the in- 
stantaneous waiting-time. That is, let W(t) be the time that a customer arriving 
at ¢ would have to wait before beginning his service. Evidently W(t) jumps up- 
ward discontinuously every time someone arrives who has a nonzero service 
time. Otherwise W(t) approaches 0 with slope —1 until it jumps again or reaches 
0. At 0 it stays =0 until another jump occurs. The magnitudes of the jumps are 
the (independent) service-times of the customers arriving at the jumps. W(t) isa 
continuous parameter Markov process of the mixed type considered in Feller 
[5]. Let P(w, t) = pr{W(t) S w}. As shown in Tak4cs [14], the forward Kolmo- 
gorov equation of the process is 


aP(w, t) _ aP(w, t) 
ot Ow 


and if ¢(s, t) = E{fexp{—sW/(t)}} for Re(s) = 0, we obtain 


(2.1) 


— Pw, i) +] Plw — »,t) dB), 
iQ= 


(2.2) 2016, O = o(s, dle — A — BY)] — aPC, 0, 
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where B*(s) is the Laplace-Stieltjes transform of B(v). Let ¢*(s, 7) be the La- 
place transform of g with respect to ¢, and let P*(r) be that of P(0, ?). 
Then the Kolmogorov equation implies 


«. (8,0) — sP* 
(23) e  F—8 tM — BY 


By the busy-time, we mean the epoch of the first zero of W(t) subsequent to 0, 
when W(0) is any admissible starting point. The busy-time can be investigated 
in terms of a modified process which is like W(t) except that it stays at 0 once it 
arrives there. Let z be the first zero in t = 0, and define 


Z(t) = Wid), fort S z, 
Z(t) = 0, for t > z. 


Let F(u, t) = pr{Z(t) S u}; the forward Kolmogorov equation for the process 
is 


dF (u, t) "a dF (u, t) — [F(u, t) — FCO, d)] 
; at Ou 
(2.4) 


+r Flu — 0,1) dB) — FO, )B(w). 


Let ¥(s, t) = Efexp{—sZ(t)}}; let y*(s, 7) be the Laplace transform (with 
respect to t) of y, and also let F*(r) = E{e~*}, f* = rF*, for Re(r) > 0. 
Then the Kolmogorov equation 2.4 yields 


« _ Vs, 0) — f*ls — \ + AB") 
— ee eee — 


To solve for the unknown functions P* and F* we argue that the transforms 
¢g* and ¥* must converge (cf. Bailey [1]) for Re(s) > 0 whenever Re(r) > 0, 
and that in this region zeros of r — s + A{1 — B*] must coincide with zeros of 
the respective numerators. We show that there is an unique zero »(r) of 


r—s+X{l — By 


in Re(s) > 0 for Re(r) > 0. Choose a real 6 such that 0 < 6 < Re(r), and a real 
¢ > Re(r). Consider the line Re(s) = 4, and the circle, with center at + + A, 
defined by |r — s + A! = A + «. Define a contour C to be the circle when 
Re(s) > 5, and to be the line when Re(s) = 6. On the circle, we have the in- 
equality 


lr —strAl =A+t+€e>A ej} |AB*(s)|. 
And on the line Re(s) = 6: 


lr —s+A| = Re(r) —8+A>X2= |AB%(s)I, 
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so that the inequality 
|r — 8 + Aj > |AB*(s)| 


holds over the whole contour C. Now +r — s + X has no zeros on Re(s) = 4, 
nor any on the circle |; — s + \{ = A+ e; and B*(s) is single-valued and 
analytic in Re(s) > 0. So by Rouché’s theorem we conclude that r — s + A 
and r — s + \ — AB*(s) have the same number of zeros in Re(s) > 0, namely 
one, because 6 can be made arbitrarily small, and « arbitrarily large. 

It follows that, with 7» = n(r), 


? 


(2.6) P*(r) = g(n, 0) _ Bier} 
: n n 
(2.7) F*(r) = ¥(n, 0) = Efe}. 


In the proof above we saw that |r — s + A} > |AB*(s)|, if s is on the contour 


C. So by Lagrange’s expansion (p. 133 of [16]), for any function T analytic on 
and inside C, we have 


nat 7! dsr 


28) TG) =Me+ny+ EV EB Bro)" | i! 


this expansion is valid for Re(r) > 0, and provides a way of actually evaluating 
P* and F*. Except for the matter of inverting transforms, the solutions for the 
distributions of W(t), Z(t), and z are complete. It is easy to see that both ¢* 
and y* can be inverted explicitly as Laplace transforms with respect to r and 
give rise to exponential functions which, together with P* and F*, determine 
yg and y in terms of initial conditions. 

The results of this section may be collected in the following statements. 

THEOREM 1. The function ¢(s, t) = E{exp{—sW/(t)}} is determined as the solu- 
tion to Eq. (2.2) by the conditions 


o(s, t) = o(s, 0) exp {st — dtl — B*)} 
(i) 


- | P(0, t — y) exp {sy — dyll — B*]} dy, 


(ii) P*(r) = [ e 'P(0, t) dt = 4 ‘o(n, 0) = 1 E{e"}, 


where n = n(r) is the unique root of r — » + \ = AB*(n) in the right half-plane. 
THEOREM 2. The function p(s, t) = E{exp{—sZ(t)}} is determined by the condi- 
tions 


V(s, t) = ¥(s, 0) exp {st — Al — B*)} 


(i) —fe—v +284 | FO,t- ») exp {sy — Ay[l — B*]} dy, 
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(ii) I e‘F(0, t) dt = rh, 0) = rE{e”}, 


where n is as in Theorem 1. 
TuEoreM 3. Jf T is analytic in the open right half-plane and is as in Theorem 
1, then T(m) may be expanded by Lagrange’s series. 


3. The probability that z is finite. Since F* = E{e"*} = ¥(m, 0) where 9 is 
the root of r — » + A{l — B*(m)] = 0, it seems natural to consider Tauberian 
and Abelian theorems in an effort to find the probability that z is finite, and to 
ascertain the existence of moments. We therefore turn attention to the behavior 
of » as r — 0 along the real axis. There is an advantage to considering, instead 
of », the linear function ¢ of it defined by » = A(1 — &). Set K(s) = B*(A — As), 
so that K generates a discrete probability distribution with mean \bd, ; the equa- 
tion for » may now be rewritten 


and this fact suggests that as r — 0 along the real axis, approaches a root of the 
familiar equation from branching-process theory, § = K(£). Let ¢ be the least 
nonnegative real root of § = K(£); for properties of ¢ see Feller [4] or Harris (6). 
We now show that § — ¢ as r — 0 along the real axis. 

If r is real then so is &; for if not, then 7 is conjugate and not unique. Also, 
if r > 0, then — < ¢, because +r > 0 implies K(é) > £, and in — < 1 this is 
possible only if § < ¢, since K(0) > 0 and — = K(é) has at most two roots in 
(0, 1), one of them being at 1. To show that 0 < +r < 7’ implies &(r) > 7’), 
write — = &(r), & = &(r’). Then the hypothesis and — < ¢, #’ < ¢ imply 

K(&) — K(#’) < &€ -— &. 
Now K’(y) is steadily increasing in 0 < y < 1, so if for some u we have both 
u < ¢ and K’(u) > 1, then K(u) > u and K(1) > 1; so K’(y) S 1 fory < ¢. 
Now if — S &’, this would imply 
K(’) — K(é) S  — &, 

which is impossible. It remains to show that given u < ¢, there exists a r > 0 
such that &(r) > u. The equation x = A[K(u) — u] uniquely determines an 
x > 0, and for this z we must have &(x) = u, orelse r — n + A{1 — B*(n)] = 0 
does not have a unique root n. If now 0 < 7 < z, then &(r) > E(x) = wu, as was 
to be proved. 

It follows that as r — 0 along the real axis, 7 — 0 or A(1 — ¢) according as 
bd; S 1 or Ab, > 1. We are now in a position to prove that 

pr {z < «} = lim F(0,t) = Efexp {A(¢ — 1)Z(0)}}. 


t+2 
As rt — 0 along the real axis, the continuity of y yields 
F*(r) + p(A(1 — §), 0) 
— Efexp{A(¢ — 1)Z(0)}}. 
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Since F(0, ¢) is nondecreasing, so is 


| xF(0, dx); 
“0 


thus F(0, t) and F* satisfy the hypothesis of Theorem 4.5 of Widder [15]. This 
proves 
THEoreM 4. The probability that the first zero z of W(t) is finite is 
pr {z < «} = lim F*(r) = Elexp {A(t — 1)Z(0)}}. 


TV 


This limit is 1 if Ab, S 1, and is <1 if Ab > 1. 
A discussion similar to the above has been given by Takacs [14] for the case 

¢(s, 0) = B*(s), and this case is also treated by Kendall [7]. We mention in 
addition the following results, provable by simple Abelian arguments: If Ab, < 1 
and E{Z(0)} < ©, then 

nie) — HIZO). 

Ny’ 
if Ab, 2 1, or E{Z(O)} = ~, then E{z} = o~. 


4. The expectation of W(t). 
Turorem 5. If E{W(0)} < «, then E{W(t)} exists for t > 0 and is given by 


~t 
(4.1) E{Ww(t)} = E{W(o)} + (P(O, u) — 1 + Abi] du. 
“0 


A result similar to this appears in Clarke [2]. To prove (4.1), we differentiate 
(2.2) with respect to s, and let s — 0. From (4.1) we see that if Ab, > 1, then d/dt 
E{W(t)} is positive and bounded away from 0, so that Z{W(t)} increases indefi- 
nitely. Let M*(r) be the Laplace transform with respect to t of E{W(#)}; then 
(4.1) implies 

E{\w(0)} + P* l — Ab, 


M*(r) = 


2 


T a 


_ E{W(0)} + 1 Efexp {—nW(0)}} _ 


r 





From this it can be shown that if B(v) has a finite second moment by , 
Ab, < 1, then 


lim E{W(t)} = lim rM* = 
im E | = lim 7! = —- =. 
too r-+0 2 — Ab) 

5. The stationary distribution. We call an initial distribution P(w, 0) of W(0) 
stationary if it is invariant under the transition probabilities for W(t), that is, 
when P(w, t) = P(w, 0) for all ¢ and w. Let A(w) be the distribution whose 
Laplace-Stieltjes transform is given by the Pollaczek-Khintchine formula 

s(1 — Ab,) 


oe i a 
A*(s) = ; — mil — B*)’ rb, ‘ 
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We show that A(w) is the unique stationary distribution. From (2.3) we see 
that a given P(w, 0) is stationary if and only if the corresponding ¢(s, 0) satisfies 


(3,0) _ (s,0) — [se(n, 0)]/n 


r 7—s+X{l — B*(s)]’ 
pe — (0.0) _ P(O,0) 
n + 
These imply 
sP(0, 0) 
s — A{l — B*|’ 
and a simple Abelian argument proves P(O, 0) = 1 — Ab, . This shows that A(w) 
is unique. 
To invert the transform A*(s) explicitly, we write it as 
: 1 — Ab, 
1- Abi om B*)/bis 


and notice that since B(v) is the distribution of a nonnegative random variable 
with mean 0 < b; < «, therefore 


¢(s, 0) = 


1 - BY 
b; 8 
is the Laplace transform of the density function 
_ U(v) — Br) 


h(v) ; ora a 


where U(v) is the unit step at 0. Define 


H,(w) = U(w), 


Aasi(w) = | H,(w — v)h(v) do. 
0 


Then A*(s) may be inverted, and we have proved 
THEOREM 6. A(w) is the unique stationary distribution of W(0). It may be writ- 
ten as 


A(w) = (1 — Ab;) D&S (ab,)"A,(w), 
n=0 
which shows that A(w) is decomposable into a single step of magnitude 1 — Ab, 
at 0 and an absolutely continuous portion, and that the equilibrium solution of 
Pollaczek and Khintchine has the form of a compound geometric distribution, 
1.€., 


W(o) = > a, 


i=(0 
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where the z’s are mutually independent with the common density h(v), and 
pr{k = n} = (1 — db,)(Ad,)”. 


6. The covariance function and the spectral distribution. In this last section 
we assume that Ab, < 1, and that W(0) has the stationary distribution A(w). 
Let E{W"(0)} = a,, when this exists. The covariance function R(t) of the 
process is 


(6.1) R(t) = | wE{W(t) | W(0) = w} dA(w) — a?, 
o- 


and the Laplace transform of R(¢) is 


(6.2) R*(r) = [ je + - aL dA(w). 
is T" 


T nT 


Intuitively one might expect that in view of the Poisson arrival process and 
the independent service-times, the W(t) process would have no periodic com- 
ponents, and thus a smooth spectral distribution. That this is so under weak 
conditions is a consequence of the following: 

THEOREM 7. If \b; < 1, and B(v) has a finite moment b; of the third order, then 


22 


(6.3) | | R(t) | dt < «. 
0 


To prove that R(t) is L,(0, ©), we show that R*(r) is the Laplace-Stieltjes 
transform of an absolutely continuous (AC) function of bounded total variation 
(BTV). We make use of the following result: Let u be a nonnegative random 
variable such that E{u} and E{u*} both exist; then 

Eje™) = 1 — Ele") 
TE\u} x 
defines an unique variate y > 0 such that distr {y} is AC and E{y} = E{u*} / 
2E{u}. 

By differentiating A*(s) successively, it can be verified that if b; exists, so do 

a, and a2, and a, = Ab. / 2(1 — Ab). Let N* = AB*(n) / (+ + A), and let 
aa eke [(re™") /(r + d)] 
*  frw/ — XbDd] + 7” 
1 — [AC — db) (1 — N*)| ‘rt 


[ra,/(1 — db)| + rrA7 ? 





1; = 


By use of » = 7 + \ — AB*(n) and algebra we can write the integrand of 
(6.2) as 


wd "[w(1 — Abi) + NIAC — AB)(L — N*)e 7" — TT] — N*)" 
—wr™"fa,(1 — Abi) + ATYACL — Abi) — N*)r? — TH — N*)". 


By Taylor series arguments and repeated use of the result stated earlier, it can 
be shown that if b; exists, then each of TT, Tz, and N* is E{exp{ —ry}} for some 
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suitable y 2 0, such that E{y} < © and distr {y} is AC. It follows from Lemma 
5 of Smith [13] that for each w, the integrand of (6.2) is the Laplace-Stieltjes 
transform of an AC function of BTV. Therefore R*(r) is also. 

From (6.3), and from the remarks on p. 522 of Doob [3], it follows that if 
Ab, < 1, and B(v) has a finite moment b,; of third order, then W(#¢) has an ab- 
solutely continuous spectral distribution. The associated spectral density g(x) 
is continuous and is given by 


g(x) =4 l R(t) cos 2ezt dt = 4 Re{R*(2riz)}, 


since R* is well defined along the imaginary axis. 
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IDEMPOTENT MATRICES AND QUADRATIC FORMS IN 
THE GENERAL LINEAR HYPOTHESIS 
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1. Introduction. The important role that idempotent matrices play in the 
general linear hypothesis theory has long been recognized ({1], [2]), but their 
usefulness seems not to have been fully exploited. The purpose of this paper is 
to state and prove some theorems about idempotent matrices and to point out 
how they might be used to advantage in linear hypothesis theory. 


2. Notation and Definitions. Throughout this paper an idempotent matrix 
will mean a symmetric matrix A such that AA = A (for the sake of brevity we 
will use the word idempotent matrix to indicate a symmetric idempotent matrix 
unless specifically stated otherwise). The theorems will not necessarily hold 
for nonsymmetric idempotent matrices. The statement: Y is distributed as 
N,(u,V), will mean that a (p X 1) random vector Y has the p-variate normal dis- 
tribution whose mean is the (p X 1) vector, 4, and whose covariance matrix is 
the positive definite symmetric matrix, V. The statement: u is distributed as x(n) 
will mean that a scalar random variable u has the Chi-square distribution with n 
degrees of freedom, and the statement: v is distributed as x(n, ) will mean 
that the scalar random variable v is distributed as the noncentral Chi-square dis- 
tribution with n degrees of freedom and with noncentrality, 4. The frequency 
function of v is ({3}) 


“A y” +2é—2/2 (0/2) 

flv) + —Solehemccteemerit, 
- Qn+2i/2.0 n+ 21 
Quin] e 

(>) 


e422 <.e, 


If \ = 0, then the noncentral Chi-square distribution degenerates into the cen- 
tral Chi-square distribution. 

A’ will indicate the transpose of the matrix A, and A‘ will indicate the in- 
verse. J, will indicate the (p X p) identity matrix and ¢ will indicate a null 
matrix. Below is a list of well-known theorems which will be needed in the suc- 
ceeding sections. 

TuHeoreM A. /f A is an (n X n) symmetric matrix of rank p, then a necessary 
and sufficient condition that A is idempotent is that each of p of the characteristic 
roots of A is equal to unity and the remaining (n — p) characteristic roots are equal 
to zero. 

THEoreEM B. /f A is an idempotent matrix, then the rank of A equals the trace 
of A. 

THEOREM C. The only nonsingular idempotent matrix is the identity matrix. 

TueoreM D. If A is an (n X n) idempotent matrix of rank p such that p <n 


(p = n), then A is a positive semidefinite matrix (positive definite matrix). 
Received March 2, 1956; revised January 17, 1957. 
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TxHeoremM E. If A is an idempotent matrix whose ith diagonal element is equal 
to zero, then every element in the ith row and ith column of A is equal to zero. 

TueoreM F. If Y is distributed as N,(u, In), then v = Y'Y is distributed as 
x(n, d), where > = 4y’y. 

THEeoreM G. In Theorem F, the moment generating function of v is 


m,(@) = (1 aie 20) 0/2, A+A0 2) 


TuHEeoreM H. /f Y is distributed as N,(u, I,), then a necessary and sufficient con- 
dition that Y'’B,Y, Y'B:Y,--- , Y'B.Y be jointly independent is that B,B; = ¢ 
for all i # j. 

TuHeoreM J. Jf B,, B., --- , By are a set of (n K n) symmetric matrices, then a 
necessary and sufficient condition that there exists an orthogonal matrix, P, such 
that P’B,P, P’B.,P, --- , P’B,P are each diagonal is that B,B; = B;B, for alli 
and j. 

TueoreM K., Let B, , B., --- , Bm be a collection of (n K n) symmetric matrices 
such that ef 1B; = I, . Then any one of the conditions K, , Ke , Kg ts necessary 
and sufficient for the remaining two. 

K, : Each B; is an idempotent matriz. 

K, : BiB; = ¢ for alli ¥ j. 

K; : > nt n; = n where n; is the rank of B;. 

Tueorem L. If v is distributed as x(n, ) and w is distributed as x’(m), and uf 
v and w are independent, then u = (v/w)-(m/n) is distributed as F’(n, m, \) where 
F’(n, m, X) refers to the noncentral F distribution with n degrees of freedom for 
numerator and m degrees of freedom for the denominator and noncentrality, \. The 
functional form is 


y | 


(" +n+ P 
i a —=—— 


= re n nj2 yt 2)—1 
f(u) = ——— ot ichipaatatarecnncytiedie —_________;. 
flu 2 1! r a) r (= — =) (*) ( n se 2 


2 2 bes 
This reduces to Snedecor’s F if and only if \ = 0. 


3. Theory. Let an observation vector, Y, be distributed as N,(X8, o7/,), 
where X is an (n X p) (p < n) matrix with known elements and rank p, 8 is a 
(p X 1) vector of unknown parameters, and o’ is an unknown scalar. Y is often 
assumed to have this structure in models which are referred to as multiple re- 
gression models and in linear models used in the theory of experimental designs. 
In these models it is often desired to test hypotheses about elements of the vector 
8. The technique often employed to devise test functions is the technique of 
analysis of variance. The procedure is to partition the total sum of squares Y’Y 
of the observation vector, Y, into quadratic forms such that 


k 
(1) YY => Y’A;Y 


t=1 
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and use Cochran’s theorem ([5]) to ascertain the independence and distribution 
of the quantities Y’A,Y. This process is quite well known and will not be ex- 
plained here except to say that to use Cochran’s theorem it is necessary to be 
able to judge the rank of the matrices A; . It has been pointed out ({2]) that in 
certain cases, finding the rank of the matrices A; and using Cochran’s theorem 
is equivalent to showing that A;A; = ¢ for alli # j, or to showing that each A, 
is an idempotent matrix. In many cases it is easier to show that a matrix is idem- 
potent than it is to find the rank of the matrix. Therefore, we will prove some 
theorems which are new, and which enable us to determine the distribution of 
the quadratic forms in equations similar to (1) without having to find the rank 
of the A;. 

The first theorem which we shall prove is an algebraic theorem about sym- 
metric matrices which is useful in developing theorems concerning the distribu- 
tion of quadratic forms. 

THEOREM 1. Let A; , Az, +++ , Am be acollection of n X n symmetric matrices 


where the rank of A; is p;, and let A = > > A; where the rank of A is p. Consider 
the four conditions: 


C, . Each A; is an idempotent matrix. 
C,. A:Aj = ¢ for alli # j. 
C;. A is an idempotent matriz. 


Cy. p = Doha p; ; i.e., the rank of the sum of the A; equals the sum of the ranks 


of the Aj. 
The following are true: 
(a) Any two of the three conditions C, , Ce , C3 imply all four of the conditions 
C1, Cs, Co, Gy. 

(b) Conditions C; and Cy, imply C, and C. . 

Proof. We will first prove (a). To do this we will show that any two of the con- 
ditions C; , C. , C; imply the remaining one in the set C, , C2 , C; , and then show 
that the three conditions C, , C, , and C; imply C,. We might point out that if 
A =I then this is essentially the theorem which Craig and Hotelling proved 
({2], [4]). 

Suppose C, and C; are given. Since A is given to be idempotent of rank p, there 
exists an orthogonal transformation P such that 

P’AP = ne. # 


@ 


Thus we have 


I,» "4 
P'AP =| * = > P’A;P. 
¢ ¢ a 


Since A; is idempotent, P’A,P is also idempotent, and by Theorems D and E, 
the last (n — p) diagonal elements of each P’A,P must be zero. This is true 
since by Theorem D the diagonal elements of an idempotent matrix are non- 
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negative and since any one of the last (n — p) diagonal elements when summed 
over the m matrices is zero, each of the last (n — p) diagonal elements must be 
zero. Then by Theorem E the last (n — p) rows and (n — p) columns of each 
P’A;P must be zero. Thus we can write 


P'A;P = 
Y 


Extracting the (p X p) matrix in the upper left-hand corner of 
PAP = 2 {a PAP 


we have J, = ae B,;, where the B; are idempotent of rank p;. Theorem K 
implies that B,B; = ¢ for i ¥ j; therefore A;A; = ¢ if i ¥ j, and the proof is 
complete that C, and C; imply C; . 

Now suppose C,; and C, are given. We have 


2 m 


AA = (2 A.) = > A?+ ¥ 4,4; = DA; =A. 
\te=l i=1 tj i=1 
Thus we have shown that the sum is idempotent and C; is satisfied. 

Now suppose C, and C; are given. By Theorem J there exists an orthogonal 
matrix P such that P’A,P, P’A,P, --- , P’AmP are each diagonal (since A,;A; = 
A ,;A; = ¢), and since the sum of diagonal matrices is a diagonal matrix it also 
follows that P’AP is diagonal. By C, it follows that P’A,PP’A;P = ¢ for all 
LF fj. 

It follows, therefore, that P’A,P is idempotent, and hence A; is idempotent 
for all 7, and the proof is complete. 

We will now show that C,, C,, and C; imply C,. If the three conditions 
C, , C., and C; are true, then this implies that there exists an orthogonal matrix 
P such that the following are true: 

P’AP = ("s *), P’A,P are each diagonal matrices with p; (the rank of A,) 
ones on the diagonal and (n — p,) zeros on the diagonal. Thus since 


m T ¢g 
Era =(" ) 
i=l ¢ ¢g 


it is quite clear that the total number of ones on the diagonal of P’A;P (i = 1, 2, 
- , m) is equal to p and the result follows. 
We will now prove (b). Since A is given as idempotent, there exists an orthog- 


d I ; ; : 
onal matrix P such that P’AP = ( 7 Applying this transformation to the 
¢ ¢ 


A; gives P'A,;P = M; and M,;, has rank p;. Partition M; such that 


a ae 
M; = 
Cc, D; 
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—" I 
where B; is a p X p symmetric matrix. Since }07_. M; = ( » ©) we have 
Pe 


>7_, B; = I,. Clearly the rank of B; must be less than or equal to the rank 
of M; . Therefore, let the rank of B; equal p; — k; where k; 2 0. But the rank 
of the sum of matrices is less than or equal to the sum of the ranks, hence 


> 7-1 (pi a k;) = DP. 


This gives —>r-1 k¢; 2 0, so ky = O for i = 1, 2, --- , m, and the rank of B, 
is equal to p; . Applying Theorem K to the equation >> B; = J, it follows that 
B; is idempotent (7 = 1, 2, --- , m) and B,B; = ¢ for all i + 7. By Theorem 
J we know that there exists an orthogonal matrix Q such that Q’B,Q is diagonal 
fori = 1,2, ---,m. Let YBQ = E; where E; isa p X p diagonal matrix with 
p; diagonal elements equal to unity and the remaining diagonal elements equal 
to zero. Also, it follows that as 7; = I,, so there is exactly one matrix in the 
set LE, , E.,--- , Em whose tth diagonal element (for any ¢ = 1, 2, ---, p) is 
equal to unity. All the remaining E; have the tth diagonal element equal to zero. 
Since Q is orthogonal we know that 


( ¢ 
R= 
nXn ") Sica 


is also orthogonal. Using this transformation on the equation 


m I 
Z M; = ( P ‘) 
te ¢ ¢ 


BE &, 
R’'M;R = 
F; G; 


and the rank of R’M;R equals the rank of Z;. But then (F;, G;) = T(E, , F:) 
where 7; is an (n — p) X p matrix and G; = T,E;T; . Let t; be the first row of 
T;. Then the first diagonal element of G; is t,t; which is a sum of squares of 
some of the elements of ¢; and the first row of F; is t;E; which is a vector con- 
taining those same elements of ¢; and zeros. Hence), G; = ¢ implies that t.Et, = 
0 and t;E; = ¢. The first row of G; is t.E.T: = gy. Applying this argument to 
each row of T; we have F; = ¢ and G; = ¢. 


Hence 
E; ¢ 
R’'M;R = ( 
¢ ¢ 


and R’M;RR'M;R = ¢ (for all « ¥ j) and R’M;R is idempotent. Hence, A, 
is idempotent and A,A; = ¢ (for all « # 7), and the proof is complete. 

It has been pointed out (Craig, [2]) that if Y is distributed as N,(y, J), then a 
necessary and sufficient condition that Y’AY be distributed as x*(p) is that A 
be an idempotent matrix of rank p. We will generalize this result into 
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THEOREM 2. If Y is distributed as N,(u, 1), then a necessary and sufficient con- 
dition that Y’AY is distributed as x'"(k, \) (where \ = }3y’Ay) is that A be an idem- 
potent matrix of rank k. 


Proof. We will first prove sufficiency. Let P be an orthogonal matrix such that 


P"AP = e -. and let Z = P’Y. Then Z = r 
a Z 


a wats of Lhe 
where a = ( ) = P’y and where a; and Z, are each k X 1 vectors. Z, is dis- 
Oe 


tributed as Ni(a, 7). Also Y’AY = Z'’P’APZ = Z;Z,. Thus by Theorem F, 
Y’AY = ZZ, is distributed as x(k, \) where \ = }aja;. This proves suffi- 
ciency if we can show that aja; = p’Ap. To do this let P = (P,P2) where P; 
has dimension n X k, then 


Pi I, ¢\ (Pin 
uw’ Au is a’ PP’ 4 PP'p = u(P, P.)P’AP ; = (u’P, ; uP) . 
P; ¢ ¢/ \Pou 


, / 
= uP, Pip = a. 


) is distributed as V,(a, /) 
2 


To prove necessity, we will assume that Y’AY is distributed as x(k, \) and 
show that this implies that A is an idempotent matrix of rank k. We know that 
there exists an orthogonal matrix C such that C’AC = D where D is a diagonal 
matrix where the number of non-zero diagonal elements, d,;; , equals the rank of 
A. Let Z = C’Y, then Y’AY = Z'C'ACZ = Z'DZ = > ue d;z; . Since Z is 
distributed as N,(C’u, J), we know by Theorem F that z; is distributed as 
x’"(1, \,) where A; = [E(z,)|°/2. Since the z; are independent the moment gener- 
ating function of d,z° is 


II (1 = 2 d;;) “U2 hi thg U2 58) . 
t=1 


Also, since the hypothesis states that Y’AY is distributed as x(k, \) (where 


. . oa by? —A+ 2t)-) 
\ = 4u’Au) the moment generating function of Y’AY is (1 — 24) *?e °°” 
Since Y’AY = >> dizi, the moment generating functions are equal, and we get 


(2) a-_r «= LT (1 — 2 dite eee 
i=1 


It is clear that there exists a neighborhood of zero for ¢ such that the quan- 
tities on the left- and right-hand sides of Eq. (2) exist and have derivatives of 
all orders. 

If any of the d;; were neither 0 nor 1, the right-hand side of this identity would 
be an analytic function of ¢ with different singularities than the left-hand side. 
By this same argument it follows that exactly k of the d;; are one and the others 
vanish. It also follows that \ = }> A,. 

Thus we have shown that if Y’A Y is distributed as x(k, \), then k of the d,, 
are equal to unity, and n — k of the d;; are equal to zero. But the d,; are the 
characteristic roots of A, and hence A is an idempotent matrix of rank k and the 
theorem is established. 
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It might be pointed out that \ = 0, and x(k, \) degenerates to x(k) if and 
only if Au = ¢. 
Using Theorem 1 and Theorem 2 of this section and Theorem H of the pre- 
ceding section, we can state the following Theorems: 
TuroreM 3. If Y is distributed as N,(u, I) and if Y'AY = >-§-, Y’A:Y where 
the rank of A equals p and the rank of A; equals p; , then 
(1) any two of the three conditions C, , Cz, C3 are necessary and sufficient for 
all the remaining conditions C, , --- , Ey ; 
(2) any two of the three conditions D, , D, , Ds; are necessary and sufficient for 
all the remaining conditions, C,,--- , Ey. 
(3) any two conditions C; and D;i # j are necessary and sufficient for all the 
remaining conditions; 
(4) E, and C; are necessary and sufficient for all the remaining conditions; 
(5) E, and Dy; are necessary and sufficient for all the remaining conditions. 
C,: Y’A,Y is distributed as x”(p,, \;) where \; = (u’Awm)/2 for i = 1, 2, 
ee 
C,: Y’A;Y and Y'A;Y are independent for all i ¥ j. 
C;: Y’AY is distributed as x'*(p, \) where X = (u’Ap)/2. 
D, : Each A; is an idempotent matriz. 
D,: A;A; = ¢g for alli ¥ j. 
D; : A is an idempotent matriz. 
FE, : yn Pe PD. : 
THEOREM 4. In Theorems 2 and 3 if Y is distributed as N,(u, o J) then all the 
results follow except each quadratic form and each d and ),; must be divided by o°. 


Cochran’s theorem states: if Y is distributed as N,(g, I), and if 


Y’'Y = > ina YAY 


(where the rank of A; is n;), then a necessary and sufficient condition that Y’A;Y (i = 
1, 2, --- , k) are independently distributed respectively as x'(n,;)(i = 1, 2, --- , k) 
is that >>. n; = n. Madow extended this to (Madow, 1940): if Y is distributed 
as N,(u, I) and if Y’'Y = >-i-, Y’AiY (where the rank of A; is n;), then a neces- 
sary and sufficient condition that Y'A,Y (i = 1, 2, --- , k) are independently dis- 
tributed as x(n; , ;) is that Det n= Nn. 

We will now extend these theorems. 

Tueorem 5. If Y is distributed as N,(u, V) where V is ann X n positive defi- 
nite symmetric matrix, and if Y'BY = >-‘_, Y'B:Y where the rank of B, is p, 
and the rank of B is p, then any one of the six conditions, C, , C2, C3 , Ca, Cs, Co, 
is necessary and sufficient that the Y’B,;Y be independently distributed as x'"(p, , dx) 
where \; = 4y/Au. 

C, : BV be idempotent and >-i-1 p; = p. 

C.: BV and each B,V be idempotent. 

C; : BV be idempotent and B;VB; = ¢ for alli # j. 

C,: Y’BY be distributed as x(p, \) and p = D-i-1 pi. (A = 4u'Ay). 

Cs: Y’BY be distributed as x'"(p, ) and B,V be idempotent (where X = 4yu’Bu). 
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C.: Y’BY be distributed as x(p, d) and B,VB; = ¢ for i # j (where \ = 
bu’ Bu). 
Proof. Since V is positive definite, there exists a non-singular matrix P such 
that /’VP = 1,. Let Z = P’Y; then Z is distributed as N,(P’y, J,). Also 
Y’BY = Z'P 'BP’'Z, Y'B,Y = Z'P BP’ ‘Z, and 


Z(P "BP )Z = Sia Z(PBP™)Z. 


If welet A = P'BP’ ‘and A; = P'B,P’", then we have Z'AZ = )oi. Z'AZ, 
and the results follow immediately from Theorem 3 if we can show that: A being 
idempotent is equivalent to BV being idempotent; A; being idempotent is 
equivalent to B;V being idempotent; B;\VB; = ¢ for 1 #7 is equivalent to 
A,A; = 0 for i # j. To show these we proceed as follows: If A is idempotent 
then this means ('BP’')(P"“BP’"') = P“BP’™. Performing left multiplica- 
tion by P and right multiplication by P’ gives BP’'P™'B = B. But P’'P™ = 
V, hence BVB = B or (BV)(BV) = BV. Thus A being idempotent implies 
that BV is idempotent. Starting with (BV)(BV) = BV we will arrive at AA = 
A. Hence A being idempotent is equivalent to BV being idempotent. A similar 
procedure will work when applied to B;V. To show that B;VB; = ¢ fori # j 
is equivalent to A;A; = ¢ fori # j, proceed as follows: If B;VB; = ¢ fori ¥ j, 
then ¢ = P'B,P’'P’VPP"B;P’" = AgA; = A;A;. The reverse pro- 
cedure also follows, and hence the theorem is established. 

(In this theorem, AV and the A;V need not be symmetric. Also it should be 
remembered that AV being idempotent is equivalent to VA being idempotent 
and similarly for A,;V). 

We also noted that putting k = 1 we get the 

Corouuary 5.1. lf Y is distributed as N,(u, V) where V is a positive definite 
matrix, then a necessary and sufficient condition that Y'AY be distributed as x"*(p, d) 
where p is the rank of A and where \ = }y’Au is that AV be idempotent (not neces- 
sarily symmetric). 


4. Illustrations. Consider the linear hypothesis model Y = X8 + e defined 
in Sec. 3. If we partition the XY matrix and 6 vector such that 


X = (X,;,X2) and B= \,) 


where X, is of order n X p; and @ is a p, X 1 vector, then we can write Y = 
XB+eas Y = Xya+ Xx t+ e. 
To test the hypothesis Hy : a = gy, we can form the ratio 


(4.1) soe. 
Q Pr 
where wu is distributed as Snedecor’s F with p, and n — p degrees of freedom. The 
quantities Q, and Q can be derived (Kempthorne, 1952) by the following process: 
Q is the minimum value of e’e with respect to the parameters in the model Y = 
XB +e = Xiat+ Xo +e. 
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Q, = Q — Q when Q, is the minimum value of e’e with respect to the model 
Y = Xv + e (the model restricted by Ho). By a straightforward application of 
& minimization procedure we see that 


Q = Y"(I — XS'X’)Y = Y’AY and Q, = Y"(I — X.S,'X:)Y = Y’BY 


where S = X’X, S. = X2X:, 1 — XS'X’ = A, and I — X.S,'X; = B. To 
tind the distribution of Q/o* and Q,/o* the method sometimes employed (Kemp- 
thorne, 1952) is quite a complex procedure of finding the ranks of the correspond- 
ing matrices A and B and applying Cochran’s theorem. An alternative method 
using theorems on idempotent matrices to obtain the distribution of u when Ho 
is true and when H; : a = ¢ is true is as follows: 

Obviously A and B are each idempotent. Since 


(4.2) X’'(I — XS"X’) =¢, 


it is clear that X:(7 — XS 'X’) = gand Xid — XS'X’) =¢.LetC = B-— A; 
then by using 4.2, 


C = (I — X;S;'X;) — (UI — XS'X’) 


is clearly idempotent and AC = ¢. Hence by Theorem 3 we have 
. Q/o = (Y'AY)/e’ is distributed as x(n — p, da). 
2. Q,/o° = (Y’CY)/o’ is distributed as x”(p; , Ac). 
3. Q and Q, are independent. 
Ag = 1/20° (8'X’AXB) = 1/20° [8’X’U — XS'X')XB] = 0, so Q/o’ is 
distributed as x*(n — p). 


. Xe = [1/(20")] [e’X’ {UI — X.Sz'X2) — (I — XS'X’)}X,] 
= [1/(20°)] fa’(XiX1 — X1X.S7'X2X))al, 


and since XiX; — X1X2' So. de positive definite, Q, o has the central 


Chi-square distribution if and only if a = g;i.e., if and only if Ho is true. 

Hence by Theorem M, u = (Q,/Q)-|(n — p)/m] is distributed as 
F'(p~,, n — p, Ac) and reduces to the central F (Snedecor’s F) if and only if Ho is 
true. 
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SOME EXAMPLES WITH FIDUCIAL RELEVANCE! 


By Joun W. TuKEy 


Princeton University 


1. Summary. It has been believed by some (({17], p. 204, and perhaps, by impli- 
cation [21], p. 2, near line 10) that—and R. A. Fisher [e.g. at the Lake Junaluska 
conference in 1946} has urged the desirability of determining whether—the 
distribution induced by a pivotal, sufficient and smoothly invertible set of quan- 
tities is unique (that is to say, the induced distribution is independent of the 
choice of a particular set of pivotal quantities among those sets with these proper- 
ties). If true, such uniqueness would be important in connection with the theory 
of fiducial probability. 

It is the purpose of this paper to present certain examples of particular interest 
showing that these conditions do not provide uniqueness. The first example 
applies to any family of two-dimensional normal distributions with fixed and 
known variances and covariances. A one-parameter family of pivotal pairs of 
quantities are provided, such that no two of the induced distributions are the 
same. Each pair is sufficient, and consists of two independent quantities, each 
distributed according to a unit normal distribution. Each pair is shown to be 
smoothly invertible of every finite order. This example can be extended to the 
Behrens-Fisher situation. The second example is due to L. J. Savage, and exhibits 
a two-parameter situation where the two alternative pairs of pivotal quantities 
constructed according to the prescription of Segal [24] give rise to different dis- 
tributions. 

Mauldon [19] has recently published a quite different example of nonunique- 
ness which is also based on the bivariate normal distribution. In his example, 
the means are known and the second moments are to be estimated, so that there 
are 3 essential parameters. 

The paper concludes with a reasonably complete bibliography of papers on 
fiducial probability. 


2. Introduction. The history of fiducial inference has been clouded with dispute 
and failures of understanding—possibly, however, to no greater extent than is 
reasonably to be expected when basic new concepts are being forged between the 
hammer of mathematics and the anvil of concrete applications. This is not the 
place to review this history, to try to describe fiducial inference as it appears 
today, or to compare it with other schemes of inference (even as seen by one 
person). [The writer hopes to do the latter two of these elsewhere (cf. [25}).] 

The uniqueness of the result of the fiducial argument has been held by Fisher 
to be of central importance, and conditions which ensure, or might ensure, unique- 
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ness have been important to him and his colleagues (e.g., [21]). The uniqueness 
problem does not seem to have received the attention which it deserved, even 
though it was a relatively completely formulated mathematical problem, and 
could be discussed without touching on any of the relatively sensitive issues of 
philosophy or principle associated with fiducial inference. 

The main example given here appeared first in a paper on “The Purposes of 
Fiducial Inference” which was presented, as part of a symposium on ‘‘Probability 
and Statistical Inference’’, to the Econometric Society and the Institute of 
Mathematical Statistics in Minneapolis on September 6, 1951. The example, as 
presented at that time, was formally the same, but no detailed proof of smooth 
invertibility was given, and the need for one was not adequately recognized. The 
present improvements both in scope and simplicity of approach are the results 
of comments and stimulation by L. J. Savage, to whom go the author’s best 
thanks. 

The example showing that the Segal construction can lead to nonuniqueness 
is due entirely to Savage, and dates from about the same time (summer, 1952). 
It was first communicated to the writer in November, 1952. 

In so far as examples of nonuniqueness are concerned, the earliest seems to be 
an unpublished example of P. H. Diananda. Savage informed the writer (in 
November, 1952) that this is referred to in a Cambridge thesis [29], also un- 
published, of R. M. Williams. M. 8. Bartlett says that this example is based 
on the Wishart distribution, and obtains different results in a symmetrical way 
by starting with the variance of first one variable and then the other. It thus 
presumably belongs to the same class as Savage’s example, and is probably some- 
what more complicated to discuss. 

More recently, Mauldon has published an explicit example, and has indicated 
the existence of a family of examples based on the second moments of a family 
of normal distributions with known, fixed first moments. His examples involve 
a minimum of 3 parameters, and are thus somewhat more complex. His explicit 
example involves the permutation of a pair of observations and parameters, as 
does Savage’s example, but in a somewhat less informative situation, since the 
set of pivotal quantities used were not constructed according to a general method 
of independent interest, as was the case with Savage’s example. The indicated 
extensions include continuous-parameter families of alternatives, but seem likely 
to be less understandable than the continuous-parameter cases described here. 
(Charles Stein has pointed out a way of looking at Mauldon’s example in terms 
of the subgroup of triangular matrices and its conjugate subgroups which makes 
this example more interesting.) 


So far as I know, these are the available examples concerning the nonunique- 
ness of the result of the fiducial argument (as described in [33]) under varying 
assumptions. (Conditions under which uniqueness clearly exists can be given, 
but we shall not discuss them here.) 


3. Formalities. A specification determines the probability distribution as : 
function of parameters. A quantity is a function of observations and parameters, 
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defined for all possible combinations of admissible values of its arguments. 
(Values of the parameters are appropriate as arguments whether or not they de- 
termine the distribution of the observations concerned.) A set of quantities is 
pivotal (with respect to a specification) if, whatever admissible values of param- 
eters are inserted in the quantities and are, at the same time, the parameters 
determining the distribution of the observations appearing in the quantities, 
the resulting set of statistics has the same distribution. A set of quantities is 
sufficient if, when any arbitrary fixed values of the parameters are inserted as 
arguments, the resulting set of statistics is sufficient for the parameters deter- 
mining the distribution of the observations. A set of quantities is smoothly in- 
vertible (of class a), if, when any possible set of observations is inserted as 
arguments, the mapping from parameters to quantities: 


(1) has the same range for any possible set of observations, 

(2) is 1 to 1, and hence has a single-valued inverse, and 

(3) this inverse is continuous (and has continuous derivatives of all orders 
up to a). 


If a set of quantities are pivotal and smoothly invertible, then each set of possible 
observations induces a distribution on parameter space. [A distribution is 
uniquely associated with the pivotal quantities by their pivotal property. Fixing 
a set of possible observations fixes a 1 to 1 bicontinuous (i.e., continuous in both 
directions) relation between quantities and parameters which transfers this dis- 
tribution to parameter space.| 


4. Twisted two-dimensional normals. We now start from a family of two- 
dimensional normal distributions in which all second-degree moments (about the 
mean) are fixed, but where all locations in the plane are possible. If we introduce 
an appropriate coordinate system, the specification becomes the following: 


x and y are normally and independently distributed with averages 
uw and y and unit variances. 


[We shall abbreviate such statements in the form “x and y are NID(u, »; J).’’| 
We deal with one bivariate observation drawn from some distribution of this 

family. (This observation may, of course, be the vector of means from a sample 

of n from a population NID(u, v; Wn /).) 

Then the quantities 


WU =r — wp, 
mRres Ts 
are immediately seen to be NID(0, 0; 1) for any wz and », and hence to be pivotal. 
Fixing uw and »v, the values of w; and w, determine x and y. The latter are surely 
sufficient, since they specify the sample (of one) completely. Thus w, , wz are 
sufficient and pivotal quantities, and since they are obviously smoothly invertible 
of all orders, we have one smoothly induced distribution. 
But the fact that the probability density for (w,, we) is constant on circles 
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about the origin enables us to modify these quantities without disturbing their 
distribution. 

Let f(u, v, r) be a sufficiently smooth, but otherwise arbitrary, function of 
three variables. Put 


r= Val tui = Ve—e + 


Wis = (x — pw) cosf + (y — v) sinf = w,cosf + wsinf, 
Wes = —(x — pw) sinf + (y — v) cosf = —w, sinf + w. cos f, 


where f = f(u, v, r). Then wy and we; are, for each fixed u and v, also NID(O, 0; /) 
and, moreover, we have r = wiy + way. For we have merely twisted each circle 
of radius r through the angle f(y, v, r). Each of these pairs (wy, , we;) is pivotal, 
and sufficient (since r° = wiy + wy, is known as soon as wy and w2y are known, 
so that fixing u and » fixes r and hence make w, and w, available, and thus makes 
x and y available!), and, if they are smoothly invertible, are candidates for the 
construction of our counterexample. 

We shall have use for the Jacobian of the transformation from yp, v to wi; , We, 
(with «, y held fixed). The details of calculation are presented in a later section, 
but the result is 


i A(wy, Way) ms 
O(n, v) 


(a 


= 1+ (x — p) - ; 

o ly—» f 
where the partial derivatives of f are taken considering f as a function of the 
three independent variables y, v and r. It is shown in the same section that 
solutions p, v of 


Wis(X, Y, My ¥) = 4, 
We(X, Y, u,v) = b, 
exist for any chosen xz, y, a and b for any continuous f(y, v, r) and that, if the 


Jacobian above is always positive, then these solutions are unique, and as many 
times continuously differentiable as f, and J themselves. 


5. Detailed example. In order to give a detailed counterexample, we specialize 
flu, v, r) to the form 


Bu 
1 +r?’ 
when our pivotal quantities become, when written out explicitly, 


Bu 


Wig = (x — pw) COS - —— 


1+ (x — uw)? + (y — v)? 


flu, v, T) 


+ (y — ») sin — oe 


1 + (a — pw)? + (y — »)”? 
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TES. 
1+ (x — uw)? + (y — »v)? 


Woe = —(x — pw) sin 


eles i Bu 
+ ty _o 1+ (a4 — pw)? + (y — vv)? 


Since f, = 0 and f, = 8/(1 + r°) we find that 


O(Wia, Wea) 


y-v 
—O(m, v) 


1 + (z — »)? + (y — »)”’ 
which is surely positive for |8| < 2. To complete the example, we have only to 
show that different values of 8 lead, for one and the same set of observations 
x, y, to different induced distributions for (yu, v). 

In view of the existence and continuity of all derivatives the induced density 
in the (u, v) plane is 


=1-8 


19g) / ° / = 2} l r2 
(density for wig, Weg) = | 1 — B —€- 
atu, ») pre ee tS 8 
where r = (x — uw)’ + (y — v)’. This induced density is clearly different for 
different values of 8. Subject, then to the verification of (i) the general value of 
the Jacobian, (ii) the general existence of solutions, and (iii) the uniqueness of 
solutions when the Jacobian is positive, the announced example is complete. 


O(wig ; u 


6. The existence and uniqueness of solutions, and the value of the Jacobian. 
We now investigate the existence and uniqueness of solutions, x, v of 


wis(2, Y, B, v) = da, 
Wes(z, ¥, u,v) = 6, 


for any given x, y, a and b. We shall use direct methods, noting that smooth 
invertibility, under the conditions we use is also a consequence of the following 
result: 

Any a times continuously differentiable mapping from a connected open domain 
to a simply connected range whose Jacobian determinant is continuous and of con- 
stant sign, and whose inverse carries compact sets into compact sets, ts smoothly 
invertible of order a. This invertibility theorem is proved elsewhere [26], and 
examples are given there to show that no one of its topological conditions can 
be omitted. 

Introduce polar coordinates for (a, b) through 


a = pcos 8, b = psin 6 
and polar-like coordinates for u, v by 
w=2z-—rcosA, 


y=y-—rsnA, 


then the equations to be solved for r and A are, when converted to polar co- 
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ordinate form, r = p, and 
A — flu, v,r) = A — f(x — reos A, y — rsin A,r) = 6 + 2kz, (k an integer). 


Clearly we need only study the second equation 
g(A) = A — f(x — peos A, y — p (sin A, p) = 6 + 2km (k an integer) 


for given x, y, p, 8 and see when it always has a unique solution. 
Moreover, we may use these auxiliary variables to evaluate the Jacobian / 
of wy and wey , or, equivalently, of a and b, with respect to u and v. We have 


1,8) _ Me Heh /' 20.0) 
A(u,v)  A(p, @) A(r, A) a(r, A) 
A(p, 8) i d(p, 6 


= p — p - — 
a(r, A) O(r, A) 

| 1 0 | 

00 (A) 
= | = = yg d 
Ci 
or OA 

where we have used p = r as necessary. Thus if we evaluate g(A) we will also 
evaluate the desired Jacobian J. 

As A increases from 0 to 27, the value of ¢(v) varies from 


») 


¢g(0) = —f(x — p, y, p) to g(2r) = 2x — f(x — p, y¥, p) g(O) + 2x 


and since it is continuous it must pass through the value 6 + 2kx for some integer 
i. Thus we can always solve the initial pair of equations for any f(u, v, 7) which 
s continuous in its arguments (at least in its first and third arguments together.) 

If now we show that ¢(A) is strictly increasing, it will follow that it takes 
every value only once, and that solutions not only exist but are unique. Clearly, 


J = g(A) = 1 — (psin A)f, + pcos Af, 


= 1 — (rsin A)f, + rcos Af, 


=1—(y — v)f, + (x — p)f,, 
and if the Jacobian is always positive, g(A) is monotone, and solutions not 
only exist but are unique. 


Moreover, since 


06 06 
— = ¢(A) =J —=f, 
0A ; or : 
Op _ 

or : 

the unique inverse is clearly as many times continuously differentiable as are 
f, and the Jacobian. 
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7. The Behrens-Fisher problem. The best-known example to which the 
fiducial argument has been applied is the so-called Behrens-Fisher problem, 
first treated by W. V. Behrens [31] and afterwards discussed by many writers 
(see Breny [32] for a recent review and [33] for Fisher’s most recent statement). 
The problem arises from two samples, one of nm; observations from a normal dis- 
tribution with average ~, and the variance n,oj, and other of n, observations 
from a normal distribution with average yu, and variance no2 , when it is desired 
to make an inference about 4; — we. (For convenience we have defined o; and 
o2 in an unusual way.) 

If we take x; and z, as the means of the two samples, and s; and s3 as the con- 
ventional estimates of the variances of these means, then 

1 — MM t2 — pe 


= =", «=. ———, 
C1 C2 


81 82 
Wz =— , Ws ie 
o1 o2 
are independently distributed sufficient pivotal quantities. The distribution of 
w, and w, is unit bivariate normal, so that wy and wy may be introduced as 


before, and (because yu; and ye do not appear in w; or w,, etc.) 


O(wiy, Way , Ws, Ws) om O(wiy , Was) ) 

O(u1, M2, O1, G2) O(ui, 2) \do1/\ doe 
and if the Jacobian of wy , wey , with respect to uw: , ue depends on a (as it does) 
so too does the Jacobian of all four pivotals with respect to all four parameters. 
Consequently the induced distribution on parameter space is not unique for the 


Behrens-Fisher specification. (It might be interesting to examine other distribu- 
tions than the classical one.) 


8. Savage’s Example. We now set forth the example due to L. J. Savage, which 
shows how uniqueness can escape us in a different way. 
Let x and y be distributed according to 


202 
¥(z, 7] | Qa, 8) dx dy = a8 (x + yom dx dy, 
a+f8B 


where a and £ are positive and 0 S z, y < o. If the cumulative distribution of 
zis 1 — S. and the cumulative conditional distribution of y given z is1 — T 


then 
_ aBx —az 
s= (1+ )e ' 


q By ) —By 
T=({1 ——_F __ : 
( + i+ pz)° 
as is easily verified by integration. From their definition these quantities are 
uniformly distributed on 0 = S, T S 1 and are pivotal. (They are essentially 
examples of the pivotal quantities pointed out by Segal [24].) Moreover, for z 
and y fixed, 7 takes any values between 0 and 1 for suitably chosen 8—and for 
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z, y and 8 fixed, S takes all values between 0 and 1 for suitably chosen a—thus 
the open square is covered for each choice of (z, y). 

Now let us calculate the Jacobian from (a, 8) to (S, T). Clearly dT/da 
vanishes, so that 0S/08 is not involved. The other two derivatives are 


as ee a. 2 , 
Oa (a + B)? 
oT y 6 
= = — ~——-. [(1 + Bx)(1 + Br — lie” 
5B (+ [1 + Bx) Bx + By) — 1) 
and are each clearly negative for all positive a, 8, x, y. The Jacobian is their 
product and is clearly positive. 
The induced density on 0 © a, 8 < + is given by the pivotal density 
(identically unity) multiplied by the Jacobian, namely 


ary ary [a + 28 + Blo + A)all(l + Bx)(1 +82 + By) — 1 

(a + 8B)? (1 + Bx)? 
where we have separated a factor symmetric in (a, x) and (8, y) from a factor 
manifestly not so symmetric. 

The specification at the beginning of this section was symmetric in (a, 2) 
and (8, y) so that if we take 1 — S as the cumulative distribution of y, and 
1 — T as the cumulative conditional distribution of x given y and go through 
the identical argument, we will find that the induced distribution for a, 8 is 
obtained from the above by symmetry—by interchanging a with 8 and z with y. 
The result will clearly not be the same! 


(a + 28) + Bla + A)zle™, 


da dg, 


Thus the two applications of Segal’s process to this specification do not lead 
to the same induced distribution. As a consequence we see that conditions of 
monotony of pivotal quantities, though they enforce smooth invertibility, can- 
not enforce uniqueness of induced distribution. For S and T are monotone in 
all of x, y, a, 8, as follows when we note 


as mt ox , 
op 3 =(a + 8)? 
and use our earlier results. 


a 
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STATISTICAL PROPERTIES OF INVERSE GAUSSIAN 
DISTRIBUTIONS. II. 


By M. C. K. Tweepte 
Virginia Polytechnic Institute 


0. Summary. Given a fixed number n of observations on a variate x which 

has the Inverse Gaussian probability density function 

a ( ox x yy . 

exp +9 - Dext’ Q<z< oe, 
for which E(x) = /¢ = uy, it is shown how to find functions of the sample 
mean m whose expectations can be expressed suitably in terms of the param- 
eter ¢ (or u). In particular, it is shown that the conditional expectation of any 
unbiased estimator x, of the rth cumulant «x, is 


E(k, |m) = 2m(3dn’)"te™ | (u — 1)**e#"* du/(r — 2)! 
1 


where g = An/m. This expectation may be evaluated either by series given in 
the paper or by using published tables of numerical values of certain functions 
to which it can be related. The conditional variance of the usual mean square 
estimator s° of x2 is also found. These results give an asymptotic series for the 
conditional variance of a generalization x3 = (n — 1)s'/E(s’ | m) of a statistic 
discussed by Cochran. Exact formulae for the expectation of the statistic s’/m’* 
and its mean square error as an estimator of \ are given or described. This 
statistic is a consistent estimator of \~* and has asymptotically an efficiency of 


$/(¢ + 3). 


1. Introduction. An earlier paper [1], which will be called ‘‘Paper I,” was 
mainly devoted to the characteristics of an Inverse Gaussian variate and its 
reciprocal and to the maximum likelihood estimation of the parameters. The 
work reported in that paper was started some considerable number of years 
ago because of some unusual features of some experimental data which had 
been obtained in the physics research laboratories of the University of Read- 
ing, England. At a casual examination these data showed a strong tendency 
for the dispersions to increase when samples were considered with increasing 
means. This tendency was confirmed by calculating arithmetic means and the 
sums of the squared deviations from the means for a large number of samples, 
plotting the two against one another and fitting regression curves, using some 
rather arbitrary assumptions about weighting. A comparable technique was 
used by Fisher, Thornton, and Mackenzie [2] in analysing some bacterial counts. 
Our case led to the question of the effect of Brownian motion in the experiments, 
and to some unsolved problems in theoretical statistics. Although the values 
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of the parameters in the Reading data were such as to make theoretically pre- 
cise solutions to these problems unnecessary, some exact and asymptotic re- 
sults have recently been obtained on the conditional distributions of certain 
statistics at fixed values of the sample mean, and on the overall distributions 
of some statistics which can be based on them, on the assumption that the 
random variation was of the Inverse Gaussian nature which would be caused 
by Brownian motion. These results are of some theoretical interest even if the 
physical origin of the basic Inverse Gaussian family of distributions is ignored. 
It may also be added that in the originating physical experiments there were 
some relatively minor disturbing factors [3] which made the Brownian motion 
theory an incomplete statistical model. 


2. Some general formulae. In this paper the standard form adopted for the 
probability density function of an Inverse Gaussian variate z will be 


‘has ; ( 20 r ur Uy 
(1) f(x; ,r) = exp 4 -= 3:8.>. »\ ES 


with 0 < x < «& (cf. (1d) of Paper I). The population mean of z is py = @/X. 
The integral over (0, ©) of (1), with complex values for @ and X, is equal to 
unity if the real parts of ¢'/d and \ are positive. The distribution (1) will rarely 
appear explicitly; for we shall be mainly concerned with the sample mean and 
conditional distributions referred to fixed values of the sample mean. 

As has previously been shown [4], a Laplacian form for a probability density 
function facilitates the derivation of the regression mean, against the sample 
mean, of any statistic which is algebraically independent of the population 
mean » but whose overall expectation is a suitable known function of yu. The 
Inverse Gaussian family is of this form and this property has already been used 
in Paper I. We now proceed to give some general formulae which will be used 
subsequently to derive some further statistical results. 

We shall use m to stand for the arithmetic mean of a fixed number n of in- 
dependent observations on (1). As was shown in Paper I, the distribution of m 
is of the same form as (1), with @ replaced by gn. In the problems considered 
in this paper, many of the mathematical operations are most conveniently ex- 
pressed in terms of the variables defined by 


(2) g = dAn/m, 6 = gn. 
The probability density function of g is 
(3) exp (—30'g7 + @ — 3g) ‘\/2xg = gf(g; 0, 1). 


This is of the same form as the density function of y, the reciprocal of the In- 
verse Gaussian variate x, as given in (30) in Paper I, with A = 1 and 6 = I/z. 

The moments of g are directly obtainable from (12) and (33) of Paper I. In 
particular, Var (g) = @ + 2 (ef. (35) of that paper). For convenience of refer- 
ence, we give 
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6E(g™) = 
GE(g”) = Eg) =6+1 
GE(g*) = Eg’) = 6 + 304+3 
OE(g*) = E(g’) = & + 66 + 150 + 15 
GE(g*) = E(g’) = 6 + 100° + 456° + 1050 + 105 
eE(g*) = E(g’) = & + 156 + 1050° + 4206 + 9450 + 945 
6° E(g"") = E(g*®) = 6° + 216° + 2106 + 12600° + 47250 
+ 10395@ + 10395 


By simple algebraic operations on these results one can easily find polynomials 
in g whose expectations are the positive integral powers of @ from the first to 
the sixth. 

Under fairly general conditions on the arbitrary functions A(u) and l(g) 
which appear in the following equation, it is true that 


(5) BV Uge | h(ue* du | a} = é | e*u*h(u)E{UGu-) | du} du, 
1 J 1 


where G is a random variable with Gf(G; Ou, 1) as its probability density function. 
It is both necessary and sufficient that the integrals and expectations in (5) 
exist for all the required values of the variables on which they depend. A proof 
may be established by expressing the expectation on the left as an integral and 
reversing the order of the integrations with respect to A and wu. 
To develop the first application of (5), take l(g) = . Then, since 
E{@'u’ | 6u} = ou, 
therefore 


(6) g” {oie J h(u)e*"* du|@> = o | e "hiv + 1) dv. 
} 0 


From the effective uniqueness of the inverse of the Laplace transform, the 
variate whose overall expectation is taken on the left of (6) must be equal to 
the conditional expectation, with a fixed value of g or m, of a variate whose 
overall expectation can be equated to the expression on the right of (6). The 
problem of finding the conditional expectation, or regression function, thus 
becomes the relatively simple one of expressing the overall expectation in terms 
of @ and then inverting the Laplace transform of h(v + 1) which appears on 
the right of (6). In this procedure for inversion it is permissible to take @ to be 
complex, if necessary, so long as its real part is positive. 

If T is some unbiased estimator of a function T(#) of @ of known form, it 
therefore follows that the conditional expectation of this estimator is 


E(T |g) = ge" | h(uje?"* du = g™ / hiv + 1)?” du 
71 “0 


(7) 
ryt aes igh @)I: 
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However, this symbolic result (7) is not always the most convenient one to use, 
as it sometimes leads to infinite series which can be avoided by introducing 
functions for which tables of numerical values have been published. When 


(8) T(0) = Co 


C and s being independent of 6, the inversion of the transform gives, in an ob- 
vious notation, 


(9) ECO") = E(P |g) = Coe” [ ” (uu — 1) *"* du/(s — 2)! 


(10) = Cg *** eHh._2(g"”), 


where Hh,_,(*) represents the Hermite polynomial function as given by Jeffreys 
and Jeffreys ([5], 23.081, who give also convergent and asymptotic series ex- 
pansions of it. When s is an integer greater than 2, these expansions of the 
Hermite polynomials are infinite series. However, the right-hand side of (9) can 
be expressed in closed form in terms of more extensively tabulated functions by 
integrating it by parts. A convenient general formula, to avoid some rather 
repetitious work if the requisite Hermite polynomials are not known in this 
form, is ‘ 


[sone du = (1 — BDPY“F(w) ano |"? du 


(11) 
+ bP(1 — bDP)“f(u) | unce ™. 
Here a and b are positive constants, f(u) is a polynomial function of u, P is an 
operator with the general property typified by PF(u) = [F(u) — F(O)}/u, F(u) 
being a polynomial in u, and D is the operator of differentiation with respect 
to u. The compound symbol (1 — bDP)~ is an abbreviation for the series 1 + 
bDP(1 + bDP(L + bDP(1 + --- ))), which = 1 + bBDP + BDPDP + 
b’DPDPDP + ---. 
In (9) the general formula (11) would take f(u) = (u — 1)", a = 1,6 = 
g'. As an example, suppose that 7(@) = 6°. Then we shall have 
flu) = u‘ — 4u° + 6? — 4u + 1, which 1 at u 
Pf(u) = u® — 4° + 6u — 4 , which —latu 
DPf(u) = 3u — 8u+ 6 , which 6 at u 
PDPf(u) = 3u — 8 , which —5 at u 
DPDPf(u) = 3 , which 3 atu 
Thus 
E,'(6"*) = ge” [ (u — 1)'e#"* du/4! 
(12) i 
= 9 {(1 + 697 + 3g°)I — g™* — 5g} /24, 
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where 
(13) I=e | et du = & ” \/2x/g / eo” dv/+/2r, 
1 j 


for which Laplace found a continued fraction expansion (cf. [6], p. 263; [7], p. 
v; [8], Eq. (92.11)). It is of interest that the expression within the braces in 
(12) is the difference between the numerator of one of the convergents of this 
continued fraction and J times the denominator of the same convergent, so 
that a rather large number of significant figures is needed in J if (12) is to be 


evaluated accurately. A comparable situation occurs with other formulae in 
this field of study. 


The following special formulae, which are obtainable by combining (4) and 
(5) with h(u) = 1, are also useful: 


(14) E(D = é | : ae dz, 


which has a well-known asymptotic series expansion, obtainable by integrating 


by parts, and may be expressed as a continued fraction (cf. [8], Eq. (92.16)); 
and 


2-1IE(gI) = —@E(I)+60+1 
2-21R¢@l) = @#ET)—-°+6+60+6 
2’-31E(g°T) = —e°E(I) + & — & + 26° + 420 + 1200 + 120 
(15) 2*-41B(g'T) = G@E(I) — 0’ + 0° — 26° + 66° + 3600 + 20400 
+ 50400 + 5040 
2°-51E(g°T) = —0°E(1) + O10 — 1168 + 216" — 310° + 416° 
+ 3726 + 35286 + 156246 + 916 + 9! 


In deriving these last formulae (15), it was found convenient to use, on the 
right side of (5), the identity 


i=0 


é | > (ile ate dz = e I xe * dx(co — C1 + C2 — Oe + ++ *) 
6 


+ OG — & +o — -**) 
+ 116 *(c, — c+ ++) 
+ 210*(c; — +++) 
+ ---+(r—1)6"c,. 
For completeness, the following results are also recorded: 
E(gI) = 6 
Eg") = 6° + 26" 
E(g*I) = 6* + 50° + 80° 
E(g*I) = 6° + 90°° + 330°” + 480°. 


(17) 
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In conjunction with (4), these results enable polynomials in g™ to be found 
which, when multiplied by J, give expressions whose expectations are the nega- 
tive integral powers of @ from the first to the seventh. The results for the first, 
third, fifth, sixth and seventh negative powers are given essentially in (12) and 
(23). 

It is also convenient to introduce a further new variable to simplify the 
presentation of the results in the following sections. We write 


(18a) J =1-—gl 
(18b) = 1 — ge” [ ee du 
1 


=g — 397° + +--+ + (-1)8-5---(2r — 1g” 
(19) 


+ (=1)'8-5-++(2r — 1)(2r + 1)ge [ ute du. 
. « 


It can be shown that J decreases monotonely from 1 to 0, and that gJ increases 
monotonely from 0 to 1, when g increases from 0 to infinity. 

The form (19) gives an asymptotic series which is satisfactory when g is 
sufficiently large. A useful expansion for moderately large values of g is the 
continued fraction 


eta 
3 
=f g+-—— 
1+ 1+— , 
¥ 9 Bee es boty ed See 
.} <aeee bof Siem. 
1 + ete. g + ete. 


eae 


The constants in the successive partial numerators are the positive integers, 
following the sequence with alternating reversals 1; 3, 2; 5, 4; 7, 6; 9, 8; etc. 
Alternatively Laplace’s continued fraction could be used to find 7. When g 
approaches zero, 


(21) J = 1 — (4ng)'” + 0(9). 


In the notation adopted in the National Bureau of Standards’ Tables of 
Probability Functions ([9], p. xix), 


I a g"F(g"”), J het 1 i L(27*?9'*). 


Hence a published table of either F(x) or L(x) may be used to shorten the 
computations, provided it gives enough significant digits. Table I gives J, J 
and gJ for certain values of g, chosen because Sheppard’s 1939 tables [7] could 
be used for them without interpolation. Burgess’s Table 3 [6], which is un- 
fortunately ‘seriously infested with error’ according to page x of Sheppard’s 
tables [7], would similarly lead directly to values for which 3g is an exact square. 
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TABLE I 


I J gJ 





-65567 95424 .34432 04576 .34432 04576 
-21068 46146 . 15726 15414 -62904 61657 
-10153 00996 -08622 91039 -77606 19348 
-05916 30957 .05339 04683 .85424 74935+ 
-03856 16209 .03595 94764 .89898 69106 
-02706 29435— .02573 40346 .92642 52463 
-02001 48834 .01927 07158 -94426 50756 
-01539 14954 -01494 42939 .95643 48119 
.01219 85870 .01191 44568 .96507 10004 
-00990 28596 .00971 40353 .97140 35283 


3. Conditional means and variances of certain statistics. The cumulants of 
the general Inverse Gaussian variate were given in (9) of Paper I. Since they 
are of the form of (8), we can use (9) of the present paper and get 


(22) Elk, | m;A,n) = am(yan')'e | (u — 1)" 6 du/(r — 2)!1, 
1 


from which 
En (xe) = E(k, |m;,n) = nm?J = X'm'gJ = nig? 
(23) Em’ (xs) = E(ks|m;d, n) = 3n’m*{(g + 3)J — 1} 
Em (ta) = Elia | m;d, n) = gnimi{(g’ + 10g + 15)J — g — 7}. 


These formulae may be verified by using (4) and (17). The terms within the 
braces in (23) are the same as occur in the successive convergents of the con- 
tinued fraction expansion (20), so that, in a similar way as the computation 
of (12), high precision is needed in J to give accurate numerical results. For 
example, when g = 25, if we took J = 0.0385616 or J = 0.035959 
or L = 0.964041, which are correct according to the usual rounding-off rules 
to the last digit shown, the rounding error (which is 0.21 of the last place shown 
for J, and 0.48 of the last place shown for J and L) would give a value for Ex, '(xs) 
which would be 12 per cent too high if I were used, or 11 per cent too low if 
J or L were used. 

When g is large, the asymptotic expansion of the Hermite polynomial given 
by Jeffreys and Jeffreys ([5], 23.082) gives 


(24) Bt | m,n) ~ 2 (RY Sy Ost or — 9 


m \2r/ tao 2t!(r — 2)'(—2g)* 
From this, or by using the asymptotic series expansion (19) for J, 
E(k |m;X, n) ~ X*m*(1 — 3g + 15g — 105g + 945g — --- ) 
E(is | m;, n) ~ 3x°m*(1 — 10g + 105g"? — 12609~* 
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(25) + 173259“ — ---) 
E(k | m;, n) ~ 15N“*m'(1 — 21g™ + 378g — 69309~ 
+ 1351359 — ---). 
4. Conditional variance of Cochran’s x? statistic. By the same technique of 
inverting a Laplace transform, as was shown previously ([4], p. 48), the con- 


ditional variance of the usual unbiased mean square estimator—viz, s = 
Ye — m)*/(n — 1)—of xe can be found. We first find 


(26) E(s" | m;2,n) = 277 Bd) + 2 Bae) 

for which an exact expression in terms of J or J can be found by using (12) and 
(23), since x; = 6 °A‘n®. The required conditional variance is obtainable by sub- 
tracting the square of E(s’ | m; Xd, n) from (26). The asymptotic form is 


, 2m* 3(n — 6)m 
(0? | 6 ~ Re 
Var (s° | m;\X, n) a om + — 


6(12n — 47)m’ . 30(47n — 152)m' \ 
coer are mere 


(27) 


4 


For comparison with some of the results already given [4] for the Inverse 
Poisson (or chi-square type) and the Poisson types of distribution, and also 
to provide a more direct comparison with the exact chi-square distribution 
which occurs in the distribution of the maximum likelihood estimator of A, we 
may consider a measure of dispersion equivalent to that studied by Cochran 
{10} for other distributions: 


(28) Xs = (n — 1)s"/E(s' | m; d, n). 
Obviously 
E(x2|m;,n) =n — 1. 


By using (25) and (27), we get, when g is large, 


Var (x3 | m;A, n) ~ 2(n — D{ a am (1 _ *) 


54m* 59 

1 Sat (1 — 8) 4} 

It will be noticed that the absolute values of the leading terms in the series 
in both (27) and (29) are minimized, as functions of the sample size n, and 
simultaneously become very insensitive to the precise value of A, when n is 
quite small—between 3 and 6 approximately. With the Inverse Poisson dis- 
tribution the second term in the series corresponding to (29) was found to 
vanish when n = 5, It may therefore be surmised that the standard chi-square 


(29) 
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distribution will be a good approximation to that of xi with samples of about 
this size, with either the Inverse Gaussian or the Inverse Poisson distribution, 
assuming that the correct regression formula E,'(«2) is used in (28). An ap- 
proximate confidence or fiducial interval for \ may be derived from this result. 


5. An approximate estimator of \~*. The statistic s’m™* might be used as an 
estimator of \~’, as an alternative to the estimator of maximum likelihood dis- 
cussed in Paper I. The conditional mean and variance of s’m™ are obtainable 
from the above results, to provide a partial check on its suitability for this 
purpose. Thus from (23), 


(30) E(s'm~* | m;, n) = X"gJ = X"(g — g’l). 


The series expansions of this formula and of the formula for the conditional 
variance are obtainable immediately from (25) and (27). Both the expectation 
and the variance depend to some extent on m, in contrast to the correspond- 
ing results which follow from (22) in Paper I, viz, 


(31) By (az! — w/a = 1) | 5, n} = I), 


(32) Var{ 3 (xz* — m”)/(n — 1) | m;), n} = 2/n'(n — 1). 


The overall expectation of s*m~ is, from (4), (15) and (30), 
E{s'm™ | 9, i, n} 


33 oO 
(88) — (Gn)? + (on)? — (on)'e™ [2% al/B 


! 5! (—1)"“r — 1) 
iS on * * (on) 


+ (n)*e™* / ~ (Dt! ax}. 


grt 


2?.2! 
(34) 


The formula (34) might be used as an asymptotic series for computations, or a 
continued fraction might be found. 

The conditional mean square error, E(s‘m~* — 2n~'s’m™* + d7*|m; X, n), 
may be evaluated by using the results of (23) and (26). The exact overall mean 
square error can be found from it by using (4) and (15), but the full expression 
is too lengthy to be given here. Its asymptotic series expansion is 


2-3 —1\2 2 3(n a8 6) 
E{(s'‘m rA~)" | oA, 2} mea + ee 
15(5n — 33) , 105(27n — 33) | 
Ps i. ct tink, ond. 
This formula is sufficient to show that s’m™~ is a “consistent” estimator of 
\”. By comparison with (31) and (32), its efficiency, on the basis of the mean 
square error or the variance, is seen to be asymptotically ¢/(¢ + 3). 


(35) 
aad 





INVERSE GAUSSIAN DISTRIBUTIONS. II 705 


6. Acknowledgment. The work reported in this paper was done under a 
grant from the National Science Foundation. 


REFERENCES 


{1] M. C. K. Tweerprs, “Statistical properties of Inverse Gaussian distributions. I,”’ 
Ann. Math. Stat., Vol. 28 (1957), pp. 362-377. 

{2] R. A. Fisner, H. G. Toornton, anp W. A. Mackenzie, “The accuracy of the plating 
method of estimating the density of bacterial populations,” Annals of Applied 
Biology, Vol. 9 (1922), pp. 325-359. 

(3] M. C. K. Twexpre, “A mathematical investigation of some electrophoretic measure- 
ments on colloids,” unpublished thesis for M.Sc. degree, University of Reading, 
England, 1941. 

[4] M. C. K. Tweepre, “Functions of a statistical variate with given means, with special 
reference to Laplacian distributions,’’ Proc. Cambridge Philos. Society, Vol. 43 
(1947), pp. 41-49. 

[5] H. Jerrreys anv B. 8. Jerrreys, Methods of Mathematical Physics, Cambridge Uni- 
versity Press, Cambridge, 1946; 3d Ed., 1956. 

[6] J. Buregss, ‘On the definite integral Qn-V/2f 5 e~* dt, with extended tables of values,” 
Trans. Roy. Soc. of Edinburgh, Vol. 39 (1898), pp. 257-321. 

[7] W. F. Sueprarp, The Probability Integral, British Association for the Advancement 
of Science Mathematical Tables, Vol. 7, Cambridge University Press, Cambridge, 
1939. 

(8] H. 8S. Wauu, Analytical Theory of Continued Fractions, Van Nostrand, New York, 1948. 

[9] Tables of Probability Functions, Vol. 2, National Bureau of Standards, 1942. 

[10] W. G. Cocuran, “The x? distribution for the binomial and Poisson series, with small 
expectations,’”? Ann. Eugenics, London, Vol. 7 (1937), pp. 207-217. 





THE MEAN AND VARIANCE OF THE MAXIMUM OF THE 
ADJUSTED PARTIAL SUMS OF A FINITE NUMBER OF 
INDEPENDENT NORMAL VARIATES 


By M. E. Souarr anp A. A. ANIS 
Chelsea Polytechnic, London 


1. Introduction. In planning the storage capacity of a reservoir it is desirable 
to avoid in so far as is practicable both the loss of water that occurs if the res- 
ervoir overflows and the harm that is done if the reservoir is empty when water 
is needed. Hurst [1] on the basis of data from a long series of annual totals of 
river discharges has discussed the relation between the capacity, the inflow and 
its variability, and the draft from a reservoir. In the present paper the theoreti- 
cal analysis of the problem as studied by Anis and Lloyd is carried further. 

If, for a period of n years, the annual increment of inflow minus draft is repre- 
sented by the variable X; (¢ = 1, --- , n) and the partial sums of these incre- 
ments by S, = }-421X; (r = 1, --- , n), then the maximum U, over the n-year 
period of these S, is the maximum accumulated storage when there is no deficit, 
their minimum L, gives the maximum accumulated deficit when there is no 
storage, and their range R, = U, — L, gives the capacity necessary to avoid 
the two difficulties mentioned above. Anis and Lloyd [3] have studied the dis- 
tribution of U, and R, for the idealized case in which the X; are taken as in- 
dependent standard normal variables and have shown that, for any n 2 2, the 
expected value of the maximum is (27)? })\?a's” and hence that the 
asymptotic value of the mean range, which is twice that of the maximum, 
agrees with the value 2{(2/x)n]"” obtained by Feller [2]. Furthermore Anis [4] 
has shown the second moment about the origin of the maximum to be 


n—1 s—1 
er + ee Lie - 0 
“ aT send tal 

and has obtained [5] a recurrence relation for computing moments of higher 
order by means of which he has tabulated the values of the first four moments 
forn = 2,3, --- , 15. 

However, from both the engineering and the statistical point of view it is 
sometimes desirable to separate the effect of inflow and draft, since the latter 
may be controlled in such a way that the former is the decisive random vari- 
able. In his paper Hurst considered the effect that would have been obtained 
by a rule of release which made the annual draft equal to the mean annual in- 
flow for the n-year period, X, = (1/n)>°7.1 X;, so that the accumulation 
after r years became the adjusted partial sum S; = )-i.. X; — rX,,. For these 
adjusted partial sums Hurst and Feller both obtained [(x/2)n]'” for the asymp- 
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totic mean range.’ Statistically the study of these adjusted partial sums is ad- 
vantageous because, since they are distributed about zero provided merely the 
individual X,; are distributed about a common though not necessarily zero 
mean, there is now no loss of generality in taking that common mean to be 
zero. 

In this paper we obtain, for the case in which the X; are independent normal 
variates with a common mean and unit variance, the distribution of the maxi- 
mum of the adjusted partial sums and find, for any n 2 2, the first and second 
moments about the origin to be 


’ 1 ne 4 dy 

win) = 3 4/2E (n 8) 

; ifnr-1, Yass s(2s — n) } 
a(n) 34 . ae > t=1.+/(n — s)@(s — 0? 


6 
with the asymptotic values }[{(x/2)n]"? and (n/2) — n’” respectively. Since 
the distribution of the minimum of the adjusted partial sums is, as in the case 
of the unadjusted sums, that of minus the maximum, the mean range is twice 
the mean of the maximum so that our asymptotic value is seen to agree with 
that obtained by Feller and Hurst. 


2. Distribution of the maximum of the adjusted partial sums. In addition to 
the notation already introduced in Section 1, we shall use throughout ¢(z) to 
denote the probability density function of a standard normal variate, i.e., 
o(z) = (24)? exp (—z*/2). In this connection it should be noted that in ac- 
cordance with the remarks above, our results will be valid if the annual incre- 
ments are independent and normally distributed about a common mean with 
unit variance since reduction to the standard normal variates X; will not affect 
the S,. 

We shall also use P,(u) and p,(u) to denote respectively the distribution 
function and the density function of the maximum over r, U;, , of the adjusted 
partial sums S;. Since by definition S‘, is zero, we consider the n — 1 sums 
S (r = 1, --- ,m — 1) and let their maximum be V, so that U’, = Max [V,, 0]. 
Then P,(u) = Pr{U, S u} = Oforu < Oand P,(u) = Pr{V, S u} foru = 0, 
so that P,(u) has a saltus at wu = 0 and p,(u) is not defined there. For u < 0, 
p.(u) = 0 and for u > 0, p,(u) = aP,(u)/du. 

For any n 2 2 


(1) Pau) = fn) [ @eyexp (4x) Idx, wz 0), 


1 However, the values of the range of these adjusted partial sums observed by Hurst 
appeared to be more nearly proportional to n-™. For this reason the authors of this paper 
thought the exact formula for the mean range of these adjusted partial sums as a function 
of n would be of interest. 
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where the region of integration K is defined by 


K:>. X;-— 7X, <u (r= 1,---,n— 1) 
t=] 
and x is the n-dimensional vector of the observations X;, --- , X,. We intro- 
duce the transformation 
(2) x = By, 


where B is the n X n matrix given by 


It is easy to see that the Jacobian J, of the transformation (2) is given by the 
recurrence relation 


Ja=1+ Jar 


and hence that J, = n. Now x’x = y’B’By = y’C y where C is then X n ma- 
trix given by 


—1 


Hence x’x = y'Cy = ny, + Y’A Y, where A is the (n — 1) X (n — 1) matrix 
obtained from C by omitting the last column and the last row and Y is the 
(n — 1)-dimensional vector 7: , yz, -** , Ya-1- Reverting to (1) and (2), we see 
that yw = X., y, = S;(r = 1, ---,m — 1) and hence P,(u) can be put in the 
form 


(3) Pau) = Vn [J (m—1) [” xy" exp (—4¥'A¥) TH dw. 


It is worth noting here that the integrand in (3) is, except for a constant factor, 
precisely the integrand in expression (6.1) in Anis and Lloyd [3]. Using the value 
obtained there for that integral, we deduce immediately that 


(4) P,(0) = =. 


Differentiating (3) by using the rule for differentiation of multiple integrals 
when the integrand does not contain the variable with respect to which we 
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differentiate but the limits of integration do, which is justifiable since our in- 
tegrand is a well behaved function, we obtain immediately 


(5) = x he(1t)Fn--a-«(ts) (n = 2), 


where h,(u) is the integral defined by Anis and Lloyd [3];i.e., for s > 1 


h,(u) 
(6) 


= [+ [ou = wen = w+ 6a — Wow) din ++ 
and 
ho(u) = o(u). 


Since the probability density functions for the maximums of the unadjusted 
partial sums are expressible (Anis and Lloyd [3]) as a linear combination of these 
integrals h,(u), it is now possible to express p,(u) in terms of those probability 
density functions. However, it proves more convenient to obtain the moments 
of the distribution (5) directly from the properties of the integrals h,(u). 


3. Properties of the integrals h,(u). 
Lemma 1. 


(7) h(0) = (2")-"°(s + 1) (s 2 0). 


This was proved by Anis and Lloyd [3], since h,(0) = (2x)7'**?¢, in their 
notation, and is repeated here merely for completeness. 
LEMMA 2. 


(8) ho) = 0 (s = 0). 


To prove this we note that, by virtue of its definition (6) as an integral, h,(u) 
is non-negative for all values of u. Hence the probability density functions p,(u) 
are from (5) the sum of non-negative terms h,(u)h,-.(u) for all n and s. Since 
p,(®) is zero, no one of these terms and so no hA,(u) can differ from zero at 
infinity. 

Lemma 3. Fors 2 1 


(9) h(u) = [ o(u — y)hr(y) dy, 
(10) ilu) = [ou = hey) dy — whalw) 


[ $(u — y)hia(y) dy + he1(0)4(u), 
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(11) Rew) = [glu — WaPhealy) dy — 2ubi(u) — (a + Dhl) 
[ o(u — y)yhia(y) dy — uh,(u) 


= [ou — WaT) dy + haO9'u) + HOW). 


To prove this lemma we note that the reduction formula (9) for h,(u) itself 
follows immediately from the definition (6) of h,(u). The reduction formulae 
(10) and (11) for the derivatives then follow by differentiation of (9) with some 
rearranging and integration by parts. 

Lemma 4. Fors 2 1 


(12) (0) = [ ” Inelg aaah) dy, 


(13) (0) = | vhelydaealy) dy = ha(OVhea(0) + fo haly)hialy) dy, 


(14) W200) = [ y*hla Vinay) dy — ACO) = [ yhaly)ha(y) dy 


= h(Ohea(0) + f ” holy) ae a(y) dy. 


These results follow immediately on putting u = 0 in the reduction formulae 
of Lemma 3. 


4. Moments of the distribution of the maximum of the adjusted partial sums. 
In this paragraph for simplicity of notation we shall omit the limits of integra- 
tion which will be from zero to infinity throughout and write h, only wherever 
we mean this function to be evaluated at zero. Furthermore we shall consider 
the distribution 


Dn+2(U) = ¥ 2x(n + 2) a he(u)An—s(u) (n = 0). 
For this distribution we write the rth moment about the origin 


u(n +2) = / UDny2(u) du _ (r 20) 


in the form 


(15) y(n + 2) = ~/2nin + 2) > Tn.7(8), 


where 
(16) I.As) = / uTh,(t)Iigo(us) du 


For I,,,0(s) we obtain on applying reduction formula (9) 
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Tals) = [ haalu)ha(u) du =f reo(u) [ (u — whey) dy du 


and, reversing the order of integration and ‘using (9) again, 


nals) = [ haat) | o(u — w)hea(u) du dy 


= [ haheonty) dy = Inols — 1) 
Hence, using (12) and (7), 


In,0(8) = Ino(0) = | holy)haly) d 
(17) 0 0 / y y) ay 

= hey = (2x) "(n+2)°" OSsSn,n2=0). 
Substituting this result into (15) and noting (4), we obtain 


a+ 1. 


nln + 2) = 5 


1 =, P4420) 


for the zero order moment, as is to be expected when one recalls that P,(u) is 
zero for u < 0 and has a saltus at u = 0. 

For J,,:(s) we proceed in the same manner but after reversing the order of 
integration we apply the first of the two reduction formulae (10) for the deriva- 
tive to obtain 


(18) Tn,1(8) - Tn a(s ve 1) -_ J,(8 ae 1) (1 = 


where 
Tals) =f ha(u)Ra(u) du < 


Similarly we obtain a difference equation for J,(s) by applying the reduction 
formula (9) to h,(u) in the integrand of J,(s), reversing the order of integra- 
tion, applying the second form of formula (10) to hi_.(u), and simplifying the 
result by using (12). The resulting difference equation is 


J,(8) _ J,(8 = 1) — —hJine (1 8 s n) 


which when summed over s gives 
t 
Jalt) = Jn(O) — Do Iehns 
ean] 
By Lemma 4 J,(0) = J,:(0) — Aoh, so we may write 


(19) Blab The 7 hy lin» 
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Returning to (18) we substitute (19) for J,(s) and sum this difference equation 
for Z,:(s) over s from 1 to ¢, reversing the order of summation to obtain 


(20) Ina) = (C+ Dina) — Eo ¢- Vohes = (LS ES). 


We note that this expression is also valid for ¢ = 0 if we write the range of 
summation for v from 0 to ¢. Since J,,(s) = Ina(n — 8) by definition (16), we 
have upon putting ¢ = n in this extended form of (20) 


I.a(0) = (n + DIna(0) — - oda... 


n 


Dd thy ha» = > (n — v)hyhn+ = 1 > heh. — > hehe, 
v=s0 val vel 


v= 


2 (n — v)hy ha = 5 Do ha hne 


and 


(21) Ina(0) = 3 > a 


Substituting (21) into the extended form of (20) and summing over #, we have 
after reversing the order of summation 


> T,,1(t) 


t=O 


_m+)n+2)< — (n — v)(n — v + 1) 
= f= 2 ba hins - 2. —— hy Rn—» 


In this last expression we note that the coefficient of h,h,_, is 


o m+ I(n+2)_ (n—v)(m—v+1)_ +) 
4 2 2 


= (v+ 1)(n —v + 1) (Osvs 


so that, using Lemma 1, we may write 


(n= 


> Tna(s) = 7% (s+ 1)(n— s+ lhhw, 


_— 1S ain —1/2 > 
"7 2.8 (n — s + 2) (n= 


Substituting this into (15) we obtain 
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n+1 

(22) win +2) = 3 4/2t2S Mm t2—y™ med) 
T el 


for the first moment of the maximum and, hence, twice this value for the mean 
range of the adjusted partial sums. 


For J,,2(s) the method is similar though more tedious. Applying (9) to h,(u) 
in the integrand of J,.2(s), reversing the order of integration, and applying the 
first form of (11) to the inner integral we have 


(23) In2(s) = Ina(s — 1) + Ka(s — 1) + 2L,(8 — 1) 


+ Ines -—1) (lL Ss n), 
where 


K,(s) = f ha(u)hns(u) du, 
(0OSs<Sn). 
L,(s) = / uh,(u)h,—.(u) du 


We obtain, in the same way as was done for J,(s) [but using the second and 
third forms of (11)] difference equations for L,(s) and K,(s) respectively 
L,(s) — L,(s — 1) = K,(s — 1), 
K,(s) — Ka(s — 1) = Ines — Ahn 


Using Lemma 4 to evaluate L,(0) and K,(0), we find the solutions of these 
equations to be, for0 < s S n, 


(lsSssn). 


L,(s) _ (s + 1)[7,,.,2(0) = T,,0(0)] + > (s or v)[ha—» hy 7 hy hn—l, 


K,(s) = I,,2(0) _ T,,,0(0) + 2 (hav hy -” hy hin). 


Substituting these expressions into (23) we have after summing over s and 
rearranging 


I,o(t) = (t + 1)°J,,2(0) — t(t + 1)J,,0(0) 
(24) 


¥ > (t — heh — bh) OStSn). 


Since J,,2(s) = In2(n — 8), we now evaluate J,,2(0) as before obtaining 


= nr a 1 1 — Cues ’ 
Tna(0) = 7-5 Ina) + 5 a v[hnoh, — hy h',-s], 


after we have observed that, as for J,,(0), the summations involved satisfy 
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certain identities, in particular 

3 (0 = 0)"haels — hehe} = —n oligos — hy haa] 
Substituting the expression for J,,.(0) into (24) and summing over ¢t, we have 
¥ Laa() = SE VO FD 70) 42D [en + Qn + 3) 


+ (n — v)(n —v + 1)(2n — + 1] [Anke — Ae has] 


= EDO FD 7,00) +22 + In — 0 + 120 — nha ohh. 


By (13) hy = I,-1,:(0) so that, using (7), (17) and (21), we may write 


> I,,2(s) 


1finm+ Din+ 3) i »s< s(28 — n — 2) | (n = 0) 


7 3 V 2x(n + 2)* a (2x)*/? 2 x Vin +2 — st(s — 


provided we interpret the summation as zero when n = 0. Hence from (15) 
the second moment of the maximum is 


ua(n + 2) = ; 
(25) 


pnt OBS __sOs = n= 2 


[a+ )D@a+3) jie 
jae 2m tm? tol Tes | (n = 0). 


A table of values of u; , u2 and o for samples of size 10, 20, --- , 150, was com- 
puted from formulae (22) and (25), (see Table 1). 


5. Asymptotic values for the first and second moments. Feller [2] has con- 
sidered the asymptotic distribution of the adjusted range for large n and found 
the asymptotic value of the mean adjusted range to be [(x/2)n]"”. That this is 
in agreement with our result can be seen by approximating to the sum in our 
formula (22) 


’ _ nS —1/2 ~i/2 
ui(n) = 3 on 2 8 (n 8s) 
by the integral 
n—l 
| gs ?(n a ar ds. 
1 


On making the substitution n@ = s, this integral becomes 


1—1l/n 
| o*(1 = 6)? dé 
l/n 
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which approaches B(4, 4) = 2 as n becomes large. Thus the asymptotic value 
for the mean of the maximum of the adjusted partial sums is 


(26) p(n) ~5 i > = 0.6267n*” 


and the asymptotic mean range is [(x/2)n]"” as obtained by Hurst and Feller. 
Similarly we obtain the asymptotic value of the second moment of the maxi- 
mum from our formula (25) 


7. _lfn®-1, Vn SS __82s — n) | 
ran) = 6 ! n Qe 2» =i Vin — ss — 0 
by approximating to the double sum by the double integral 
n—l e—1 s(2s gid n) 

sic I. J. V (n — s&s — —_ 
Integrating first with respect to t by means of the substitution t = (sz), we 
have 

*-1 (23 — n)(s— 2) 

i: t | sy (n — 8)(8 — eal 


n—l 
/s—1 OO Ol Teena 
: | l n—8 s-—l Vande bh wfnacraend ~ 


Applying the substitution s = n sin’@ + cos’é to the first three terms and the 
substitution s = 2n[n + 1 + (n — 1)sin 6] ” to the last term of this integrand, 


TABLE 1 


Values of ui(n), u2(n), on and the asymptotic approximations for 
iti (n) and op 


Exact values Asymptotic approximation 


ul ~ 0.6267 nt!? | oq ~ 0.3276 wi!2 


1.9817 1.0359 
1.4649 
1.7942 
2.0717 
2.3162 
2.5373 
2.7406 
2.9298 
3.1076 


SSSSSES ES 


3.5883 


3.8758 
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+ 8Vn sin” E - | — 8V/n sin” (+). 


(n — ]) n—1 


which approaches —12x + 4x+/n for large n. Thus the asymptotic value for 
the second moment about the origin is 


us(n) ~5 —-vVn 


and, using the asymptotic value obtained for ui(n) in (26), we find the asymp- 
totic value of the variance of the maximum of the adjusted partial sums is 


(27) an ( “ *) n * 0.1073n. 

Comparing this asymptotic value with that obtained by Anis [4] for the 
variance of the maximum of the unadjusted sums which was [1 — (2/x)|n = 
0.3634n, we see that Feller’s comment on the greater stability of the adjusted 
partial sums is well borne out by our results. 

In Table 1 we note that: 

1) the series for u;(n) converges very slowly so that the asymptotic ap- 
proximation (26) should not be used for values of n within the range 
of this table, 

2) the series for u2(n) converges even more slowly, but 

3) the asymptotic approximation (27) gives very good values for co, even 
within the range of the table because the errors in the approximations 
to ui(n) and y2(n) are in the same direction and largely cancel. 
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REMARKS CONCERNING CHARACTERISTIC FUNCTIONS 
By Eugene LuKacs 


Office of Naval Research’ 


1. Summary. In the first part of this note, we study functions of characteristic 
functions which are themselves characteristic functions and discuss also a prop- 
erty of analytic characteristic functions. In the second part, an example is con- 
structed to answer a question raised by D. Dugué [3]. 


2. Functions of characteristic functions. Let F(x) be a distribution function, 
that is, a never-decreasing function which is continuous to the right and is such 
that F(— «) = 0 while F(+ <) = 1. Its Fourier transform 


(1) o) = [eM arta) 


is called the characteristic function of the distribution F(x). Characteristic func- 
tions are very important in probability theory, and in the following discussion 
we shall use some of their well-known properties which may be found in books 
[1], [5] on the subject. We first derive a theorem which shows how given charac- 
teristic functions may be transformed into new characteristic functions. 

THEOREM 1. Let {¢,(t)} be an arbitrary sequence of characteristic functions and 
{a,} be a sequence of real numbers. The necessary and sufficient condition that 


f{® = p> a, ¢4(t) 


should be a characteristic function for every sequence {¢,(t)} of characteristic func- 
tions is that 


(3) a,=0, DYa=1. 
vel 
We first show that the condition is sufficient. Let m (m 2 0) be a subscript 


such that a,, is the first non-vanishing element of the sequence {a,}. We denote 
by 


g(t) = Pp a, oo |/|¥ a | forn = 0,1,2,-:-. 


If (3) is satisfied, then g,(t) is a linear combination of a finite number of charac- 
teristic functions. The coefficients in this linear combination are non-negative 
and their sum is one; therefore g,(t) is also a characteristic function. We see 
Received June 17, 1955. 
1 Now at The Catholic University of America. 
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then from P. Lévy’s continuity theorem that f(t) = limy..« gn(t) is also a charac- 
teristic function. 

To prove the necessity of the condition, we assume that f(t) as given by (2) 
is a characteristic function for any sequence ¢,(t) of characteristic functions. 
Let ¢,(t) = e*”; then f(t) = >-2»a,e"". This is the Fourier transform of a step 
function with jumps a, at the points v = 0, 1, 2, --- . Since f(¢) is by assumption 
a characteristic function, this step function must be a discrete probability 
distribution; therefore a, = 0, >> a, = 1, so that Theorem 1 is established. 

An application of some interest is obtained by putting ¢,(f) = n* = 
exp [—it(Inn)] anda, = n™’/ >-? n™’, where o > 1. It follows then from Theorem 1 
that the function f(t) = ¢(¢ + it)/t(c) is a characteristic function for ¢ > 1. 
Here ¢(s) = }>-%_, n~ is Riemann’s zeta function and s = o + it, with o, t real 
and o > 1. 

This result was already obtained in a different manner by B. V. Gnedenko 
and A. N. Kolmogorov (([7], p. 75) who showed that ¢(o + it)/f¢(c) is the charac- 
teristic function of an infinitely divisible distribution. 

Next, we let ¢(¢) be an arbitrary characteristic function and put ¢,(t) = [¢(t)]’, 
v = 0,1, 2, --- . We obtain then 

CoROLLARY TO THEOREM 1. Let $(t) be a characteristic function and let G(z) 
be a function of the complex variable z, which is regular in |\z| < R where R > 1. 
The function G[¢(t)| is also a characteristic function if, and only if, G(z) has a 
power-series expansion about the origin with non-negative coefficients and if 
G1) = 1. 

It is worth while to remark that the class of functions G(z) which have the 
property that G[¢(¢)] is a characteristic function whenever ¢(t) is a characteristic 
function includes also functions which are not analytic. An example is the func- 
tion G(z) = |z|’. The restriction in the corollary that G(z) should be regular is 
therefore somewhat artificial. 

Derinition. A distribution is said to be infinitely divisible if for every positive 
integer n its characteristic function is the nth power of some characteristic 
function. 

By means of the corollary to Theorem 1 we obtain the following result. 

THEOREM 2. Let $(t) be an arbitrary characteristic function and p a real number 
such that p > 1; then 


a ee 
(4) v(t) = ae 


1s the characteristic function of an infinitely divisible law. 
To prove Theorem 2 we let n be a positive integer and consider the function 


i 1/n 
G(z) = |? | ; 
p-—2z 
Here it is understood that G(z) is the principal value of the power on the right- 
hand side. Clearly 
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We expand G(z) according to the binomial theorem and see that 


_[p—-1]" $ (1 + n)(1 + 2n) --- (1+ k= In) 4) 
G(z) = Ee | {1 + ed ee e 


This shows that for any positive integer n the conditions of the corollary are 
satisfied. The function G[¢(t)] = {{p — 1]/[p — ¢(t)]}"” is therefore a characteris- 
tic function for any positive integer n; in other words, /(t), as given by (4), is the 
characteristic function of an infinitely divisible law. 

In a similar manner we derive from the corollary to Theorem 1 a theorem 
which is due to Bruno de Finetti [4]. 

THEOREM OF DE Frnettt. Jf $(t) is an arbitrary characteristic function, and if 
p is a positive real number, then Y(t) = exp {pld(t) — 1]} is the characteristic 
function of an infinitely divisible law. 

The function G(z) = e”*~” satisfies the assumptions of the corollary, so that 
we see immediately that ¥(¢) is a characteristic function for any p > 0. It follows 
then from its functional form that it must be the characteristic function of an 
infinitely divisible law. 


3. A remark concerning analytic characteristic functions. A characteristic 
function is said to be an analytic characteristic function if it is an analytic func- 
tion which coincides in some neighborhood of the origin with a characteristic 
function. 

In an earlier paper [6] the following result was obtained: 

TuHEoreEM 4 of [6]. Let o(t) be the characteristic function of an infinitely divisible 
law and assume that $¢(t) is an analytic characteristic function. Then $(t) has no 
zeros inside its strip of convergence. 

In the following we show that this statement cannot be improved. This is 
done by constructing an analytic characteristic function of an infinitely divisible 
law which has zeros on the boundary of its strip of convergence. 

Let a > 0, b > O be two real numbers and put w = a + 1b; the function 


(5) op Ae 
is then a characteristic function. This is seen immediately if we write ¢(t) = 
p + (1 — p)(1 — t/a)”, where p = a’/(a’ + 0’). We define 
(6a) m=2 [> tS — ba) dz 
0 for t<0 
(6b) M(x) = 


-2 | e"(1 — cos bt)f dt for z > 0. 


The function M(z) is real and non-decreasing in (— ©, 0) and in (0, ©). More- 
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over, M(— «) = M(+) = 0; and the integral f u’ dM(u) is finite over every 
finite interval. According to P. Lévy’s representation theorem ((5] p. 180), the 
function 


(7) y(t) = mit + [ 3 (ci = if <— #5) dM(z) 


is the logarithm of the characteristic function of an infinitely divisible law. We 
write 


* / itr \e" ad dx 
I(t) = ite e "(1 — cos bz) . 


and obtain ¥(t) = mix + 2I(t). 
It is easily seen that it is permissible to differentiate /(¢) under the integral 
sign. A simple computation gives 


Considering /(0) = 0, we see that ¥(t) = log ¢(t), where ¢(¢) is given by (5). 
We finally remark that it is possible to use Theorem 2 to construct character- 
istic functions of infinitely divisible laws which have zeros arbitrarily close to 
the boundary of the strip of convergence. As an example we mention y(t) = 
(p — 1)/[p — $(0], where o(t) = (1 — it/a). The function y(t) then has 
the zero tf = —ia, and the boundary of its region of convergence is the line 


Im (t) = —a(p — 1)/p. By selecting p large enough, the distance between 
the boundary and & can be made arbitrarily small. 


4. A question raised by D. Dugué. In this section we are concerned with certain 
factorizations of non-infinitely divisible laws. The uniform (rectangular) distribu- 
tion has the characteristic function (sin ¢)/f; it is not infinitely divisible, since 
it has real zeros. It is well known that it has the factor sin (t/n)/(t/n) for every 
positive integer n. The uniform distribution is therefore an example of a law 
which is not infinitely divisible but has an enumerable infinity of different factors. 
These factors, with characteristic function sin (t/n)/(t/n), depend on a discrete 
parameter n. 

In a recent paper [3], D. Dugué raised the question of whether there exists a 
law which is not infinitely divisible but has a non-enumerable set of factors 
depending on a continuous parameter. As an example for such a distribution 
Dugué uses the Laplace distribution. This example is, however, invalid, since 
the Laplace distribution is infinitely divisible. The purpose of this section is to 
answer Dugué’s question in the affirmative by giving an example of a probability 
law with the desired properties. This example will be a rational characteristic 
function; for its construction we use the following lemma. 
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Lemma 1. Let 


° vy» —Ute){i+5) 


= $0 F)t9) 


where v = a + ib, w = a + iB, anda > 0,b > 0,a> 0,8 > 0. The function 
o(t) is a characteristic function if, and only if, one of the following two, mutually 
exclusive, conditions holds: 


(i) B= VP — @+ a)? = V3(a+ a); 
(ii) B # Vb — (a + a)? and simultaneously 8 = (a + a)’ + 0/2. 
Proor. We denote by 


fa) = 2 [ “ot a 


and obtain by a simple computation 


2 2 
com {1 -|< ns | cos b — 244 sin bah if z > 0; 
(10) f(z) = ¢ e 


0 otherwise. 


Here 
d a “+ a, 
(10a) d=(a+a+f=¢+8, 


| 
avic 
C= ae" 
The function f(z) is real and f*.. f(x) dx = 1. Therefore we conclude that the 
function ¢(¢) is a characteristic function if, and only if, the trigonometric poly- 
nomial 


2 2 
(11) hz) = 1 -|° = | cos b2 — 7 sin b2 


ce 


ce 


is nonnegative. 

We assume first that c’ = b* or, considering (10a), that 6° = 0° — (a + a)’. 
Then A(x) = 1 — 2d/b sin bx and h(x) = 0 if, and only if, 2d/b < 1. It is seen 
by simple algebra that this is equivalent to condition (i). We suppose next that 
c’ ~ b and determine by elementary considerations the smallest value of h(z), 
which is 

: 2 lc? — db 1 
waettel Mz) =1—* ©  cosbm’ 
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where —2/2b < x < + 2/2b and tan br = 2bd/(c’ — 6b’). The function 
h(x) is therefore non-negative if, and only if, 1 = |c’ — b°|/(c cos bao). This 
condition leads easily to (ii), so that the lemma is established. 

We use this lemma, together with the theorem quoted at the beginning of 
Section 3, to construct a characteristic function with the desired properties. 


Let a, a , a2, 81, 82, b be arbitrary positive numbers such that az > a, and 
also 


2 
(12) 8; > max | b* — (a + a;,)’; (a + a;)? + . (j = 1, 2). 
2 


We define v = a + ib, wy = a + 16;, and w. = az + i8,. The functions 
(1 + #\(1 + =) 
0) = >_>, = 
(9-0-9 
a v d 


are then characteristic functions, since they satisfy condition (ii) of the pre- 
ceding lemma. Therefore 


= 1,2) 


at at at at 
(+2 (i+% G+#)(i+% 


C= Fe-7e-) 


(13) ga(t) = gi(t)d2(t) = ~— 


is also a characteristic function. It is also known that 


d(t) = | (1 + “(1 + “)(2 . =) 


is a characteristic function; we conclude then that this is also true for ¢(t) = 


ga(t)ps(t); i.€., 


(1 + =) € + 
(14) ¢(t) = —— > es os 
(1+#)(1-4) (1 - ~ 
a a ) 

The function ¢(¢) is an analytic characteristic function which has the strip 
a, > I(t) > —a as its strip of convergence. Its zeros iw, and 7@, are located 
inside this strip, so that ¢(t) cannot be the function of an infinitely divisible law. 
Similarly, ¢3(¢) is not infinitely divisible, since its strip of convergence is the half 
plane /(t) > —a in which it has four zeros. We have, therefore, an example of a 
law ¢(t), which is not infinitely divisible and which has, nevertheless, a non- 


enumerable infinity of not infinitely divisible factors ¢;(t). These factors depend 
on a continuous parameter 8; , which is subject only to the restriction (12). 
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THE DISTRIBUTION OF THE RATIOS OF CERTAIN QUADRATIC FORMS 
IN TIME SERIES! 


By Srymour GEISSER? 


University of North Carolina 


1. Introduction. In testing the hypothesis that successive members of a series 
of observations are serially correlated a number of statistics have been proposed. 
Durbin and Watson [4] gave the exact distribution of several of these statistics 
when they are slightly modified. We shall extend the work of Durbin and Watson 
for a non-null case of two of their modified statistics and also find a simple ex- 
pression for the moments of another of their statistics. 


2. The Double root result. Assume that X’ = (x,, 22,+-- , Z,) has probability 
density 


(2.1) f(X) = |A|'?(24)~-”” exp [—X’AX/2], 


where A is a positive definite matrix and n = 2m. Let 


A, O B, O A, 0O 
(2.2) A= : B= : A= : 
0 Ai 0 B, O A, 


where B, is positive definite or positive semi-definite and of rank m — q which 
is = the rank of A;, a real symmetric matrix. Further assume that A, B, and 
A commute pairwise, and that the characteristic roots a; of A and the charac- 
teristic roots b; of B are so numbered that if a; = 0,b; > 0 and a;/b; = aj41/bj4s 
for all a; and a;,; which are + 0. 
Now 

.v 
(2.3) G(z) = P Ee < | = PIX(A — 2B)X < 0, 
where X is N(0, A’). Making an orthogonal transformation X = PY where 
P'AP = D,, P’BP = D,, P’AP = D) are diagonal matrices with elements 
Q; = Om4j, 0; = bms; and Aj = Amys, We get 


(2.4) G(z) = P[Y’(D. — zDs)Y s 0), 
where Y is N(0, Dx’). Now let Y = Dy” W so that 


(2.5) G(z) = P[W’(D, — zD,)Dx'W < 9), 


Received June 28, 1955; revised January 3, 1957. 
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where W is N(0, I). Hence by the duplication of roots a; , b; and \; we get 


(2.6) G() = P = ;*(a, — abu} s 0| 
j=l 


where wj are independent and each is a x’ variable with 2 degrees of freedom. 
Using a result by R. L. Anderson [1] which in general terms states that if the 
c; are all different then 


P Pp 
j=l jes a 
where S = {j/c; > 0}, we find that 


m—q L 
Giz) = 1-—- Il Aj (a, — bez)” A 
jl ee 
(2.8) Sat 
° II [Aj(ax — baz) — Ala; —b;z)], 
aed 


jmtk 


onaeese 
bial 
for L = 1,---,m — q. This result could also have been gotten by contour 


integration through the results of Gurland [6] or by Madow’s generalization of 
Anderson’s result. 


3. The distribution of the Durbin and Watson Statistic in the non-null case. 
Using the result (2.8) and letting 
2m—1 2m 
(3.1) R= a Legs Zs 2 2, 
inm sa 
where 


1+) 


(3.2) 
; ~~ 
=-—p 1+ 
we may find the distribution of R. For a; = cos (jr)/(m + 1), A; = 
2pa; , bj = 1. Now [7 d; = (1 — "(1 — 9’) and & — Ay 
‘(1 + p’ — 2pz) and by Geisser [5] 
kr 


(3.3) I (a, — a;) = (m + 1)(—1)*"'2™ esc’ — —i° 


Jk 
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Therefore 


G(z;p) =1—(l—p”™™)2"(m+ 1) "(1 — p) (1 + @ — 2pz)™ 


- k+l kr ws i lhe a k 
‘2 (-) (cos 4 — 2) sin’ "(1 + 6° — *o008 At 


(3.4) 


and 
G'(z; p) = g(R; p) 
ile aa p™**)2"(m — 1)(m+ 1)"(1 — p) “(1 +p — 2pR)™ 


L m—2 kr 
. a (—3)°"* (cos = R) sin? - 


kr 
kml m+1 m+ 1’ 


oe (L + 1)4r 
2 m+l1 


For p = 0 it is clear that 


<= R € cos 


L 
g(R) = 2"(m — 1)(m + 1) ) (-1)*" 
k= 1 


: (cos . rk) sin’ _ke 
m+ 1 m+1 


(3.6) 


and hence 


(3.7) g(R;p) = (l—p™”’)(1 — op) "(1 + & — 2pR)"9(R). 


4. Approximations. In a paper by T. W. Anderson and R. L. Anderson [3] 
in which the circular serial correlation coefficient is discussed for fitted trigono- 
metric series for the mean, they have fitted the trigonometric series for semi- 
annual data to correct for variation of period two and get a quadratic form 


(4.1) q = X’CX / X'BX 


for n = 2m. 
They reduce g to the form 


2m—2 2m—2 


(4.2) DL vi / 2X vis 
j=l j=l 
where the c; are identical with the a; of the previous section. 

Therefore the distribution in this particular case for 2m observations is 
exactly the same as that for the non-circular case of Durbin and Watson for 
2m — 2 observations when p = 0. They also give the approximate distribution 
of their circular statistic as a beta distribution, and if we put 2m — 2 in place 
of 2m we get the approximate distribution density of R for 2m — 2 observations 
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when p = 0 to be 
(4.3) g(R) ~ K(1 — R’)’, 
where p = (m* + m)(m — 1)" — 3/2. 


5. Moments of a ratio. The previous work was based on the assumption that 
&z; = uw was known. However, if u is unknown, one of the statistics used is the 
ratio of the mean square successive difference to the variance; 


2 
1) oo 


The distribution of 7 is for the present too difficult for explicit evaluation. 
The moments of 7 have been found by Williams [10] and much light shed on 
the distribution by Von Neumann [8]. However, the expression for the rth 
moment given by Williams is in terms of the rth derivative of a function. 

Durbin and Watson [4] suggested a modified statistic in this case for n = 2m. 

Let 


(5.2) R = X'’AX/ X’BX or R = 4(m — 1)&/{(2m — 1)s'], 


4 (4 


where 


with latent roots 
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The distribution of R is given to be 


L m—1 
(5.3) P(R > R’) = 2X (a — Ras" TT (a, — a) 
i 
for a4: &S R S a. This result is based on a result of Anderson’s where in 


addition to double roots there is a single root which is less than all of the double 
roots. By simplification it can be reduced to 


L 

(54) P(R> R’) = 4m > (-1)""(a, — R)"*" sin’ = cog 

k=l 2m 2m 
for a4; S R S a . The moments of this ratio can be easily found for r < 3m — I’ 
since we already have evaluated the moments of the numerator in a previous 
paper (Geisser [5]) and the moments of the denominator are well known. When 
the successive observations are independent, the moments of this ratio are the 
ratio of the moments [9]. Therefore 


SNe 22m + 2r — 2)!(2m> — m — 1) 
(65) 8 = » = [(2m + r)\(2m — 1)(2m + 1) --+ (2m + 2r — 3)]° 


6. The distribution of the modified von Neumann ratio. If we consider the 
ratio 


2m—1 

2 
é (as a zi) 
- fam 
| @-_ 
89 


(6.1) 


> (2; — %)° + 4 (x; — &)° 


i.e., twice the ratio of the modified mean square successive difference to the pooled 
variance, we are able to use the Double Root Result and find the distribution 
of mo for the non-null alternative given by T. W. Anderson [2] if we further 
consider the model to be made up of two independent sets and 


A, 0 
(6.2) A= ( ), 
0 A 


l+p—p -—p 
~~ l+p 


1+? —p 
—p i+pP-—p 
As was shown previously [2], m provides a uniformly most powerful one-sided 
test. By the double root result we get, letting \j = o°A;, 
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m—1 
G(no; p) = 1 — 2m[1 + p* — 2p + pm)” ™ IT Nj 
(6.4) ): 


3 (=a — ao)" sin 


m—1 
G'(no; p) = g(no; p) = 2m~*(m — 2)[1 + p*? — 2p + enol” TT dj 
(6.5) 7% 


L 
.ak 
: ~ (—1)*"(a, _ mo)” sin* = ? 
kewl m 
where dz4: S m S az. For p = 0, 
L Les 
(66) Gm) = 2(m — 2)m D> (ay — mo)"-*(—1)™* sin? & 
kml m 
and 


(6.7) G' (no) >= g(no) = 2(m — 2)m™* dX (—1)*"\(q - mo)" sin’ = 


for dra: S m0 S a,. Hence 


(6.8)  G’(n0; 0) = g(m;p) = (1+ op — 2p + pm)” (I »;) 9(n0); 


and since 


(1 + p)(1 — p)? 1—,’ 
(6.9)  G'(n;p) = (1+ ¢° — 2p + pm) (1 — p™)(1 — 0°) *g(m). 


It is also quite easy to find the moments En; when p = 0 since we have already 
given the moments of the numerator for r < 3m — 1 and the moments of the 
denominator are well known. Hence for r < 3m — 1 


er 2** (2m? — m — r)(2m + 2r — 2)(m — 2)! 
et SS eae. 
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RESTRICTION AND SELECTION IN MULTINORMAL DISTRIBUTIONS' 


By A. Currrorp CoHEN, Jr. 


The University of Georgia 


1. Summary. Maximum likelihood estimators of the parameters of a p-dimen- 
sional multinormal population are derived in this paper which are applicable 
when sample selection and observation is restricted with respect to x, but other- 
wise unrestricted with respect to x2, --- , Z» . Restrictions imposed may consist 
of truncation, censoring, or a selection which results in full observation of all 
sample specimens with respect to z, , but eliminates certain sample specimens 
from subsequent observation with respect to z:,--- , Zp. 


2. Introduction. Samples from a multidimensional universe are often obtained 
under circumstances such that observation in certain regions of the universe is 
restricted. For example, in studies of psychological traits, observation is often 
limited to individuals who have passed certain admission tests or who have been 
subjected to other screening processes. This situation likewise arises in connec- 
tion with multivariate studies of physical characteristics in which specimens 
available for observation have previously undergone some type of sorting pro- 
cedure. From such samples, it is often necessary to estimate the means, variances 
and correlation coefficients of the universe. Considering their most general 
aspects without limitation as to type of distribution, restricted or “screened” 
samples pose a broad class of estimation problems, some of which are quite 
involved. The present paper is limited to samples from a p-dimensional multi- 
normal distribution with probability density function 


Dp 
(1) f(t, 2a, °**, 2p) = (20)? |o*|"” exp | -3 yea — m;)(x; — m)|, 
tal joe 

where the symmetric matrix ||o"’|| of the quadratic form in the exponent is the 
inverse of the variance-covariance matrix |!c;;||, and has the positive determinant 
\o’’|. Maximum likelihood estimators (estimates) for parameters of (1) are ob- 
tained from truncated, censored and selected samples, with x, designating the 
restricted variable; that is, the variable on which screening is based. Similar 
estimators obtained previously ({8], [10]) for restricted samples from a bivariate 
normal distribution, follow as a special case of results obtained here. Results 
obtained by Hotelling [12], Tukey [18], Pittman [16], and Chapman [5] guarantee 
that both the method of moments and the method of maximum likelihood lead 
to identical estimates in the case of truncated samples from multinormal dis- 
tributions. Hence for truncated samples we might have employed the method 
of moments. However, we elected to use the method of maximum likelihood 
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as it permits a uniform treatment of all the various types of restricted samples 
under consideration and it introduces no unusual algebraic difficulties. 

In practical applications such as in studies of psychological traits, the screening 
variable zx; might actually be a composite score based on a battery of tests 
rather than the score achieved on a single test. However, in this paper, we limit 
our consideration to cases in which each of the component variables, including 
2, ,hasaunivariate normal marginal distribution. In someapplications, one or more 
achievement scores might be involved which also are composite scores. For 
example, x, could be such a score. Here again, however, the limitation of nor- 
mality on marginal distributions holds. 

Various aspects of some of the basic problems involved in the present study 
have previously been investigated by Karl Pearson [15], Aitken [1], Wilks [20], 
Birnbaum Paulson and Andrews [3], Votaw, Rafferty and Deemer [19], Camp- 
bell [4], Des Raj [17], and the author [8], [9], [10]. A more complete bibliography 
of related papers can be found in reference [7]. 


3. Estimating means, standard deviations, and correlation coefficients. For 
a random sample of n (fixed) measured observations (tie, Yea, °*** , Tpa)s 
a = 1,2,--- mn, drawn from a population distributed according to (1), subject 


to a restriction on observation of variable z,, the logarithm of the likelihood 
function is 


(2) L = —(np/2) In 2e + (n/2) In |o*| 


-32, _ dD o'(xia — mi)(tja — mj) + In G(m, on), 


where G(m;, ou) is a restriction function which depends upon the type of re- 
striction imposed with respect to observation of x, by screening or acceptance 
criteria. When G is to be interpreted with full generality, it is not only a func- 
tion of m, and oy , but also of x, . By thus introducing G, much repetition in the 
derivation of estimators is avoided which otherwise would arise with the various 
selection criteria to be considered. Specific examples of G are given subsequently 
in this paper. 

For an unrestricted sample, G(m,, o1) = 1, and maximum likelihood esti- 
mates of parameters m; and o” are obtained by equating to zero, the partial 
derivatives of L with respect to these parameters and solving the resulting 
system of equations. (Cf. for example Mood [14], pp. 186-188.) In the cases 
involving restricted or screened samples, we follow a similar procedure. How- 
ever, in order to avoid certain complications which restrictions on x; introduce 
into derivatives with respect to o“’, we employ derivatives with respect to on and 
px, . According to the notation employed here, o;; = 0,0 jp;; and oy, = o; , where 
pi; is the coefficient of correlation between x; and z; . 

Considering first the means, we have 
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aL ot 1 0G 
(a) onde Ca + Gam,’ 


a se maice, 


om, tol 


(3) 


where 
(4) Cy= ps (tia — Mi)(Lja — my,)/n and Ca = _™ (Lie — m,)/n. 


On setting dL/dm, = 0, (r = 2) and dividing by Cy, we obtain the system 
of p — 1 equations: 


(5) ¥ oCa/Cw = 0, r= 2,3,---,p. 


t=] 
On solution, these yield the result 
(6) Co/Co = Aw/Au ; 


where A,; is the cofactor of |e" |. Since |lo"l| = |lois||", then oj; = Aj; / |e and 
Ai; = 04; \o"’|. On substituting this result into (6), we have 


(7) Cio/Cro = [ors lo |\/on ||] = o1s/on - 
After making the further substitution 01; = o,0;p:; , it follows from (7) that 
(8) Cio = piles / 1)Cw. 
We turn now to the variances and to the correlation coefficients. Since Cy; = Cy 


and o? = o”, we need only the derivatives 


oL ij 1 aG 
(a) ion ye ou} + G den’ 


» = -#{1-Lercu\, 


OCs 20% 


P * . 
(c) ob —N6G;0, {o" >; Cyo"o" 


OPrs 


tml 
- 2m Dudas Cido"te® + o"'o*(1 — 85 D+ ororash, 


oo 12,-*-,p- 1;s = 2,3,---,p;r<s, 


where 4;; is a generalized form of Kronecker’s delta such that it has the value 
1 if o‘c* = oo, but otherwise it has the value zero. 

We equate to zero, the (p — 1) derivatives 0L/de,, of (9b) and the p(p — 1)/2 
derivatives 0L/dp,, of (9c) to form a total of (p — 1)(p + 2)/2 equations that 
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are linear in C;;(¢ S 7). These we now write as 


Pp 
2, ¢" Ce -l= 0, 


tml 


P > » 2 . . . 
(10) bs Cu e*e" + Lr 2. Cilo"o% + oo*(1 pe 575) $- oo" 855 


i=l i<j 
—o" =0, r=1,2,-:-,p—1;8s =2,---,pir<s. 
As a solution of this system of equations, we obtain C;; in terms of Cy, as 
(11) Cis = 043 + (1013/0n)(Cu/on — 1), isj, 


which can be verified by direct substitution back into the equations of (10). 
As a special case of (11), we have 


(11a) Cis = (01,/on)Cu . 
Returning now to the definitions for C;; and Cy» as given in (4), we can write 


Cis — Calo = Do(tia — mi)(Xja — mj)/n 


— (Dd (aia — mi)/n)[ Do (25a — m;)/n] 


= [Deiat ja/n — mk; — me; + mm) 
a 


— (44; — m&é; — mz; + mm; 


and thus 
(12) ‘OFF = CC jo — D> Lier ja/N aa Zi; , . 


where % = >> 2%2/n. 
a 


With restricted sample standard deviations written as §; and restricted sample 
correlation coefficients written as 7;; where 


| > Lia/n — (> Lia/N he 


a=1 


C n n n 
2. - 
|" Zz Viatja — ~ Lia a tia|/n 8:83, 
a=l awl 


a= 


(13) 


Eq. (12) becomes 
(14) Cis — CoCo = F883; « 

Since o;; = o,0;p:; , Eqs. (8) and (11) permit us to write 
(15) Cis — CoCo = oo jlpis — ArriPrsl, 
where 


(16) X= 1— H/oi. 
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Equating the right side of (14) to the right side of (15), we have 
(17) 7,88; = oo jlpiz — Apriprs). 
We let i = j, and since 7;; = 1 and p;; = 1, it follows from (17) that 
(18) oj = 33/(1 — pis), j = 2,3, --- 
Let i = 1, eliminate o; between (17) and (18), and we have 
(19) pris = Ais/V1 — AC — ¥), j = 2,3,--+,p. 
Use (18) to write first o; and then co; . Substitute these results into (17) and 
simplify to obtain 
(20) pis = F/O — pRA)(L — PRA) + pn py A, 
with i,j = 2,3,:-:,p,t <j. 
Estimates mm, and 6, are yet to be determined, but Eqs. (8), (18), (19) and 


(20) enable us to express estimators for the remaining parameters of (1) in 
terms of these two as 


ths = £; —. A7ij(8;/%:) (4: — Mm), 


ye 
2 1—-j , 


ne Fiz — Fig — FisFus) 
Be 0 ee Y 
Vil — 40 — A) — 40 — FA] 
with i = 1,2,---,p—1,j = 2,3,---,p,i <j, and\ = 1 — 3/é}. Since 


by definition, 7;; = 1, the last equation of (21), in agreement with (19), simplifies 
to 


(22) bry = Ais/V1 — §0 — FB), 
when i = 1. Here and throughout this paper, the maximum likelihood symbol 
(°) serves to distinguish estimates from the parameters estimated. 

Although the derivations were somewhat more laborious, the above results 
were given earlier in [9]. Estimators for restricted samples from a bivariate 
normal population as given in [8] and [10] now follow as a special case of (21) 
and (22) with 7 = p = 2. 

To estimate m; and o; , we substitute (7) into (3a) and (11a) into (9a), equate 
to zero, and thereby obtain 


ou oL 
n Om, 
== 
n Oou o 


ou G@ _4 
nG dm, , 


P . 
= Cro » oo + 
(23) 


P . 
= SB Mg ee ug 
- a 


nG Oon 


Since >>; 0'"om; = 5,; (cf., for example, [14], p. 179), where 6;; = 1, if = j, 
and 4,; = 0, if i = j, it follows that >>? o“s,, = 1, and with the defining rela- 
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tions for Ci, and Cy, as given by (4), this result enables us to write 


> (tie — m,)/n + oe 36 


e=1 nG dm, 


> (Za m)"/n — On E - 2ou 2) =0Q 


quan nG Oo 


= 0, 
(24) 


The required estimates 7; and 6, are the values found on solving this pair of 
equations. The restriction function, G(m, 01) which depends upon the nature 
of the restrictions imposed on x; must be specified before Eqs. (24) are completely 
determined, but it is to be noted that regardless of G, they involve only the 
2,-marginal distribution and are independent of the remaining variables. 

Truncated and censored samples. When samples under consideration have been 
truncated or censored with respect to 2, , estimating Eqs. (24) reduce to forms 
identical with those obtained previously in reference [7] in connection with 
various types of truncated and censored samples from univariate normal popu- 
lations. They can be solved as therein described for the univariate cases. For 
example, when 2; is singly truncated on the left at a fixed terminal 2, then 
G(m,, on) = [Zo(é)]-", where Io(é) = fF o(t) dt, o(t) = [(2r)'?]* exp (—#/2), 
and § = (x30 — ™m)/o; . In this case, estimating Eqs. (24) reduce to 


1 eA , n > (tia — 10)" 
() 7} \Z—% 7 | = Ts, 
. . | 2 (te _— 2) | 


® ioe 2 (ns —adfall ~ 2, 


(c) mm = 2 — Ki, 


where 


(26) Z(e) = o(@)/I8) = exp (-#/2) / i ” exp (—#/2) dt. 


Equation (25a) can be solved for ¢, so that 6, and mm; follow in turn from 
(25b) and (25c). For further details, reference is again made to [7]. Whenever 
m, and o; are known a priori, the remaining parameters can be estimated from 
(21) with \ = 1 — s{/éj replaced by X = 1 — si/o; and m, replaced by the 
known value of m; . 

Selected samples. When sampling procedure is such that a total of N unre- 
stricted observations are made with respect to x; although as a result of selection 
or screening, there may be only n(< N) observations of x2, --- , x», then 


G(m,, on) = (Vv Qrou)” ” exp | - = (tie — m)*/20u |, 


and Eqs. (24) lead to the familiar estimates 





MULTINORMAL DISTRIBUTIONS 


N N 
(27) m=) %e/N=%, 61 = Dd) (te — &)/N. 
1 1 


Regardless of how the selection which determines subsequent observation with 
respect to 22, --- , Zp» is made, #, and é, are given by (27) while the other esti- 
mates are given by (21) where Z;, 3; , and 7,;; are computed from observations 
of the n “selected” members of the sample. 

Unrestricted samples. When no sample restrictions are imposed, and no se- 
lection is made, then not only is G = 1, but N = n, \ = 0, and the required 
estmates follow from (27) and (21) as 


(28) th; = Z;, o; = 8}, Pig = ij {j= 1,2,--:p, 


which, as already mentioned, are well known for this case. The bars (~) are 
omitted over r;; and s; in (28) since here the computations are based on the 
complete rather than a restricted sample. 


4. Reliability of estimates. Asymptotic variances and covariances of estimates 
given in the preceding section can, of course, be obtained from the likelihood 
information matrices with elements which are expected values of the second 
partial derivatives of the likelihood function L. These variances and covariances 
are of the order of 1/n, but exact expressions for them are too unwieldy to be of 
much practical value. For parameters of the restricted variable, in this case 
2 , asymptotic variances and covariances given in [7] for truncated and censored 
samples from univariate normal distributions are applicable when restrictions 
are of these types. When a selection based on 2; is made which does not restrict 
observation of z, itself, then complete sample variances 


V (1h) a! oi/N, 
Via) = 01/2N, 


are applicable as are various exact small sample results based on the z; marginal 
distribution. If the restrictions involved have not been unduly severe, that is, 
if only minor portions of the tails of the z, distribution have been affected, then 
asymptotic variances and covariances for complete (unrestricted) samples from 
a multinormal distribution will afford reasonably satisfactory approximations to 
the desired values. (Cf. Kendall [13], Vol. 11, third edition, p. 38.) 


5. Practical applications. The practical application of estimators obtained in 
this paper is illustrated with a sample given by Baten [2], and attributed by 
him to H. C. Carver. The basic sample consists of weight, height, shoulder, chest, 
waist, and hip measurements on 119 individuals. We designate these variates 
in the order listed as x; , 2 , Ys , X4, Zs , and 2, respectively. Baten’s data include 
120 sets of measurements, but it was necessary to eliminate the last one because 
of a typographical error. As given by Baten, the sample was considered to be 
complete, but for purposes of the illustrations here, it is arbitrarily truncated 
with respect to weight (x,) at 119.5 pounds. Thereby eleven sets of measurements 
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are eliminated. Estimates of the population parameters are then computed con- 
sidering the sample as truncated with n = 108, and as censored with n = 108 
and m = 11. A complete summary of estimates calculated for each of these two 
cases is included in Table 2 along with corresponding estimates computed from 
the complete sample. As can be observed from this table, estimates based on the 
truncated and censored samples are in close agreement with those computed 
from the complete sample. The computing procedures employed are illustrated 
below. 

Truncated sample—number missing observations unknown. For this case, the 
sample data are summarized in Table 1. 

To estimate parameters of 2, , we may follow the procedure described in [7] 
and first solve equation (25a), which for this example is 


a shear. | _ 108(64,169.00) 
Z-i|Z-£ ~~ (2,301.0)? 
Thereby, we obtain = —1.379, and from (25b) we computed é, = 13.7697, and 
from (25c) *%, = 119.5 — (13.7697)(—1.379) = 138.4884. Tables [11] were 
employed to reduce the computing effort which otherwise would have been 
required. 

From Eq. (16), we compute } = 1 — 3}/é] = 1 — (11.8419/13.7697)’ = 
0.2604, and the remaining estimates are obtained from Eqs. (21). For illustration, 
specimen computations are given below. 


me: = 67.9241 — 0.4701(2.4008/11.8419)(21.3056 — 18.9884) = 67.7033, 


= 1.308928. 


at /1 — 0.2604(1 — 0.4701") _, 
2.4008 oe ae 4923, 


0.4701/+/1 — (0.2604)(1 — 0.4701*) = 0.5265, 
0.2361 — 0.2604[0.2361 — (0.4701)(0.4326)] a 
vV (1 — 0.2604(1 — 0.4701*)| [1 — 0.2604(1 — 0.4326?)| 


TABLE 1 
Summary of Sample Data 


Truncation at z; = 119.5 lbs. 





fis = 0.4701 naron | 
fy = 0.4326 | Fee 
Fis = 0.6501 Fy 
Fis = 0. 4415 Fes 
Fig = 0.7873 
Fos = (). 2361 
Fo, = 0.1194 

| 


Fae 
Fas 
Fas 
Fs 





zw) = 2301.0 Zi (zie — Zw)? = 64169.00 
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TABLE 2 
Summary of Estimates 
Estimates Based on Restricted Sample 
Parameters Truncated Censored 


Number Missing | Number Missing 
Observations Unknown Observations Known 


—1.379 —1.342 


138.4884 138 .2382 
67 .7033 67 .6794 
16.3899 16.3834 
35.2581 35.2370 
28.0159 
35.3780 





13.7697 





Sample size 





Censored sample—number of missing (unmeasured) observations known. The 
sample data remain unchanged from the previous case except for the additional 
information that n; = 11. To estimate parameters of z,, we determine £ by 
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solving 


n # (tia — aio)” 
= , = 1.308928, 


| > (21a — ow) | 


where 


vie) = ™| exp (-€/2)/(W2e — [exp (—#/2) at) | = ™ 21-8, 


in the same manner as for the truncated case, and this time find = —1.342. 
Subsequently we compute 6 = > (tia — 2)/n(Y — £), = 13.9629. We 
then calculate *,; = 119.5 — (13.9629)(—1.342) = 138.2382. Using (16), we 
have } = 1 — (11.8419/13.9629)” = 0.2806. For further details, reference is 
again made to [7]. With sf, and é, thus determined, these values along with the 
original sample data are substituted into (21) to obtain estimates of the re- 
maining parameters. 

Although not complete in all details, the above calculations serve to indicate 
the general manner in which results of this paper are applicable in practical 
problems. To a certain extent, they also serve to indicate the degree of agree- 
ment to be expected among corresponding estimates based on truncated, censored 
and complete (unrestricted) samples. 
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COMPONENTS OF VARIANCE ANALYSIS FOR 
PROPORTIONAL FREQUENCIES 


By J. D. Banxkrer anp R. E. WALPOLE 


McMaster University and Virginia Polytechnic Institute 


1. Summary. With the exception of papers by G. W. Snedecor, G. M. Cox, 
and H. F. Smith ((8], [9], [10]), there seems to be little about proportional fre- 
quencies in the literature. In this paper we consider two-way crossed classifi- 
cations and two-way nested classifications. The expected values of the sums of 
squares are obtained in a form which is applicable to a variety of components 
of variance models. The tests of several hypotheses are considered. 


2. The Type I model for two-way crossed classifications. We consider an 
experiment in which p treatments are applied to q blocks. The ith treatment is 
applied to the jth block n,; times. The n;;’s having been displayed in a matrix 
with n,;; in the 7th row and jth column, we assume that the n,;’s in a given row 
are proportional to the n,,;’s in any other row. This implies that 


(1) 


=m, ngs Dong, N= dm. 


j=l t=1 


Consider the model 
(2) Vigng; = e+ tit By + (7B)is + €cx;; , 1=1,2,---, p37 = 1,2, --+ ,Q; 


ky = 1, 2,-+-+, ny, where the ¢;%;,’s are NID (0, o *) and the parameters are 
subject to the conditions 


(3) > MT = 2 58: ” > ni .(7B) 5 -  ns(eB)i = 0 


t=—1 
If we denote E(Yi;;) by &;, the above conditions are equivalent to defining 
pet., nea-t., weaken t., - Mu=s-i -is+1., 


where 


1< . 1 . 1& 
[= i. 2 Nis §53 §.3 oon 2 risks, f= N + nate. 


nj = 4 73 


A more realistic model, such as is considered by Anderson and Bancroft [1], will 
be studied in a later section. 


We now rewrite Eqs. (2) in a form where the theory given by Anderson and 
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Bancroft [1] may be applied. They may be put in the form 


a @ 
(4) Yin; =e t+ 7 Uyite + 2d V95By + > Wea pihtB) ery 7 €iseg; » 
i'= i‘= 6’ .J3° 
where 
Us; = by, Vig = 835, We vis = be dyy, 


5;; being the Kronecker 6. If we order the Yi,,;, calling them Y.(a = 1, 2, 
- , N), we may write Eqs. (4) in the vector form 


qa a 
(5) Y=n+ Se Uirg + 2 V5B; + > WisrB)is + €. 
= j= 2 


Denoting the elements of the vector U; by Ui. , we define 


2 Uia ae tia = Ura — U5 


a=1 


N 
= y Uia = > Ni, Uis' - 


awl t’=1 


Similarly, 


@ 

nj naj > > 
j= N”’ Wi; cS NV’ oe V53" = het Ny 5" Wisi 7" = 0. 

j'= ’,j’ 


Changing our notation, we denote by U,;, V;, Wi; , the vectors UJ, V5J ,Wi,I, 
where J is a column vector all of those N elements are equal to unity. Then, if 
we set 


u,= U,— Ui, 13 = V;—V;, w= Wis — Wis, 


we may write Eq. (5) in the form 
q @ 
(6) Y=urt+ re usts + 2 058; + > wis(7B)iz + €. 
i= = iJ 


It is necessary that the u,’s, v;’s, and w,;’s form a linearly independent set of 
vectors. Since this is not the case, we use the conditions (3) to eliminate 
tT», Ba, (rB)ig(t = 1, 2, --- , p) and (78)p(j = 1, 2, --- , g — 1), obtaining 


14+ E(u 1 May) re O(n - Bn) Bs 


t—1 


@) 1¢@—1 - . 
ij : pi Pq : 
+ il 2m Ns Npi + en) (78)iz + «. 


We note that * 
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and a similar statement may be made about the coefficient vectors of 8; and 
(rB):; . Making use of the relations 


U;.Uy = NBs ; U;.V; = Nij,; Ui.We; = 153549" 9 V;5.V = 1, 3653" ; 
ViWix = nizdj7, Wis Weg = ni biwvbjy , 


it may be proved that the coefficient vectors of the r’s, 8;’s, (78),;’s form three 
sets of linearly independent vectors and a vector from any set is orthogonal 
to the vectors of the other two sets. Thus, when the three sets are combined, 
they form a set of linearly independent vectors. 

We shall be interested in testing three hypotheses 


H,:(78)i; = 0, #=1,2,---,p—1;j = 1,2,---,¢-—-1 
Heir, = 0, é=w12---,p9—1, 
H3:8; = 0, j3=1,2,---,q@-1. 

The restrictions (3) imply that not only the parameters in a given hypothesis 


are zero but also all other parameters of the same kind. To test H, , we first 
compute 


SSE = )) (Yin, — m— ts — bs — (B)all’, 
t,jkey 
where m, ¢; , b; and (tb),;; are the least squares estimates of yu, 7;, 8;, and (r8),;, 
respectively, and SSE is the minimized value of the residual sum of squares. 


Next, we compute SSE,, the corresponding minimum obtained under the 
assumption that H, holds. Then 


_ 
R=) y2 — SSE, 
a= 


where ya = Y. — Y, is the reduction in the sum of squares when all the parame- 
ters are used while 


N 
R, = > yi — SSE, 


a=l 


is the reduction due to the parameters left when H, is true. The additional re- 
duction in the sum of squares due to the (78);;’s is 


SS(TB) = R — R, = SSE, — SSE. 


In the same way, SSE, and SSE; denote the minima obtained subject to H, 
and H;, respectively, and the reductions in the sum of squares due to the r,’s 
and the 8,’s are 


SST = SSE, — SSE and SSB = SSE; — SSE, 
respectively. Anderson and Bancroft [1] show that 


N 
> yi = SST + SSB + SS(TB) + SSE, 


a=l 
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and that, subject to the corresponding hypotheses, SST, SSB, SS(TB) and 
SSE are independently distributed as x’o" with p — 1, g — 1, (p — 1)(q — 1), 
and N — pq degrees of freedom. The hypotheses H; , H:, and H; are tested by 
the statistics 


Fo MS(TB) _ MST pr, = MSB 
50 ee x ~ MSE’ * " MSE’ 


respectively, where MSE, for example, is SSE divided by the corresponding 
number of degrees of freedom. 


FP; 


3. The sums of squares. The following theorem, a slight generalization of 
one stated by Mann [5], will be used in computing the sums of squares. 
TuHeoreom A. If 


E(Y) = ul + Xin 
and 
(1) l=), 83S p, 


k=l 


(2) X,, X2,--- , X, form a mutually orthogonal set of vectors, 
(3) mr = 0, dm < 0, 
kal k= 
(4) any number of other conditions hold for t141, Te42,°** , Tp, SUCH that the 
method of Lagrange multipliers may be used, 
then condition (3) may be ignored in the minimizing of 


SSE = (y — > Xin) 
kewl 


Our estimates of yw, 7;, 8;, (78)i; are m, t;, b;, (tb):;, respectively, where 
these values minimize SSE subject to the conditions (3). By Theorem A, we 
may ignore the conditions on the 7,’s and the 8,;’s. The conditions on the (78),;’s 
will have to be considered in the computation of SSE, and SSE; but in the 
computation of SSE they can be avoided by expressing SSE in a different form. 


We have 
SSE = 7 (Yin,, — 7 


t 5G 5 


2 


Taking partial derivatives, we find our estimate of &;; is £;; = Yj. , where this 
notation indicates an average over the missing subscript. Then, by the in- 
variance property of such estimators, 


=!Y.., t;= Y,.-Y.., bj = Y.;. — Y.. 
(b)i5 = Yu. - Vu. -—-¥y.4+ ¥.., 


*? 


2 y?. 
SSE= >> (Ying; — Yi) = oe Yin, s a ae: 
by 


65 key 3 Ni 
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where Y,;. is the sum of all the observations on the ith treatment in the jth 
block. 

In obtaining SSE, , all of the conditions (3) may be ignored, but, to deter- 
mine SSE, and SSE;, the method of Lagrange multipliers must be used. As 
a result of these calculations, we find that 


@ - les ca 
SS(TB) = + tte -— ¥;.-Y,;.+ ¥..)’, 
8.3 


SST = Fa (f,. ~ cua = > Yi.. _ wi. 


i=l i=l 7;. ae 


s q@ y2 2 
SSB = 2ind¥y.- Y= Ya. Ye 
j=l 


jul 1; N 


4. Other models. We still assume that 
Vejecgs = w+ 7 + Bs + (4B)i5 + ecsng; « 


For the Type II model we assume that the 7,’s, 8;’s (78);;’s and ¢;;,,’s are NID 
with zero means and variances o; , 03, 078, and o’, respectively. For the Type 
III model we assume that the 7,’s, 8;’s, and (78),;’s come from finite independent 


populations of size P > p, Q > q, and PQ, respectively, with zero means and 
variances 


2 i=l 2 
CG, 83 == Gq? "= 


P—1’ Q-1 *"P-N@-?D’ 


The assumption of zero means implies that 


P Q P.Q 
Lt 2 83 . D> (r8)i; 


P, 


P Q g 
Dr = 0, 2 8; = 0, 2, (78)i; = 0, 
cs 


‘J 


and, in addition, we assume that 
P @ 
2 (rB)i5 = 2d (r8)i; = 0. 
i= j= 


For the mixed model we may assume that the 7,’s, 8;’s, and (78),;’s are of any 
of the types described above. In addition when the 7,’s, say, are of Type I 
and the §@,’s of Type II, Anderson and Kempthorne ({1], [2]) have shown that 
it is desirable to assume that, corresponding to each 8; , there exists a population 
of (78),;’s consisting of p elements such that 


p> (xB)2, 
p= iT 


If the 7,’s came from a Type III population, we would replace p by P in the 
above definitions, and if the roles of the r,’s and 8,’s were interchanged, we 


. (78) i = 0, or = 
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would interchange i and j and replace p by g. We always assume the ¢;»,,’s are 
NID (0, o’). 


5. The expected values of the sums of squares. In every case we shall arbi- 
trarily begin with the sums of squares obtained for the Type I model. To deter- 
mine their expected values, we shall make use of the following theorem which is 
a slight generalization of one stated by Tukey [11]. 

THEOREM B. 

If yr, Y2,°** » Yp have means 1, 2, *** , Up, Variances 01,03, °**, o», and 
every pair has the same covariance, i, then 


By nly: — a = > nus — 2)’ + > 1 (1 ~ "*) (ot — d) 


t=—1 


P 
2 NY 


7. = N ’ 
We find that 


Pp ¢ 
SST => on:(w;—)*, SSB = 2 nly; — 9), 
i=1 =1 


Pa P@Ais 
SS(TB) = a nij(zs —2.)’, SSE = d (eisns; — %y.)*, 
‘J 8 .5ukes 5 


where 
w, = ri + (7rB)s. + &.., ys = By + (rB).5 + E., 


2, = (r8)i; — (rB).. + diy. — &.. 
and 


@ @ 
> 2.8; D 2.578) s5 
et ’ (7B). ax = 
N ; N 
Pp Pa 
oe D (78) is fF D> ni(78) sj 
(78).4 = ne (r8).. = — ee: 
In order to apply Theorem B, we need the variances and covariances of the 
w;, yj, and z; in a form that does not depend on the form of the model. By 
using the methods employed by Bennett and Franklin [3], we find for the Type 
III model that 


a> E(w,) - (1 ~ by) 75, g.= 0, 


2 1\ 2 1\ 1 
f= a, (1 — 5) ot + b(t -4) 4 
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2 
Or 


a ae pat on 2 
r 5 — on SS PN? (om Moe, 


where 6, = 0 if the 7,;’s come from a Type I population, 6, = 1 otherwise, and 
a similar definition holds for 6, . 


Application of Theorem B enables us to find E(SST) and division by p — 1 
gives us 


E(MST) 


+ Fa na 


Similarly 
E(MSB) 


where 


5,sab (1 — 548) a nis(78)" sj 


BEKTER @ 0 er — iy? + ie 


Finally, by the theory for the Type I model, we know that SSE is distributed 
as xo with N — pq degrees of freedom and hence E(MSE) = o’. We also note 
that, if all the n;; = 1, SSE = 0 and it is impossible to carry out any of the 
F tests which involve division by this quantity. 


6. Models with no interaction. In this case, for the Type I model, 
Yin; = BB + Ti + B; + €ijkeg 3 . 
We find, as in Sec. 2, that 
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of that section plays the role of SSE. We also saw in Sec. 2 that 
SSE, = SSE + SS(TB) 
so, if we had accepted 


H,:(r8)izs = 0, 


and decided to change models in midstream, all that would be necessary to 
obtain SSE, would be to pool the interaction and error sums of squares. We 
find the same expressions for SST’ and SSB as in Sec. 3 and that the degrees 
of freedom associated with SSE, are those obtained by pooling the degrees of 
freedom associated with SS(TB) and SSE. To obtain the expected values of 
MST and MSB one need only omit the terms involving the (78);;, and it is 
easily verified that E(MSE,) = o’. A discussion as to when pooling is desirable 
is to be found in Bechhofer’s thesis [2] and in a paper by Bozivich, Bancroft 
and Hartley [4]. 


7. Distributions of the sums of squares. Corresponding to the hypotheses 


H,:(r8)i; = 0, Hz: 7; = 0, H3:8; = 0, 
we have the hypotheses 
ov = 0, o, = 0, os = 0, 


if the corresponding variables are from other than a Type I population. Then’ 
since the populations have zero means, it follows that the corresponding vari- 
ables are equal to zero. We have already referred to the tests for the above 
hypotheses for the Type I model at the end of Sec. 2. If there is no interaction 
term, subject to H; and H;, the sums of squares SST and SSB reduce to the 
corresponding expressions for the Type I model and the tests of Sec. 6 apply 
no matter which model we may be considering. If there is an interaction term, 
the same argument shows that the Type I test can be used for H, . Thus our 
problem is reduced to testing H, and H; when there is interaction and we are 
not dealing with a Type I model. 

We first consider the Type II model where the parameters are NID with zero 
means and variances o; , a3, and ov. If all the n,,’s are equal to n, say, using 
methods similar to those of Mood [6], it can be shown that SST, SSB, SS(TB) 
and SSE are independently distributed as x°E(MST), x°E(MSB), x°E[MS(TB)}, 
and x’o with p — 1, g — 1, (p — 1)(q — 1), and N — pq degrees of freedom, 
respectively. These results hold independent of the validity of H, , H;, and H;. 
Since some of the details differ from those given by Mood we shall outline the 
proof of the above results. 

The theory for the Type I model shows that 


aa b= Y;-—Y..., ();; = Yy.—- ¥s..—V¥5.4+ ¥..., 


,2,°**,p—1;7 = 1, 2,---,q — 1, are distributed independently of 
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PG. 5 5 
SSE = 2 (€sin, ; — é;.). 
tJ sky 5 
Therefore any function of these statistics is distributed independently of SSE, 
and, in particular, this holds for t, , b, , (tb)p; and (tb);, . These results hold for 
the particular case where Y;;, = €.j . Hence 


ae gs, Oj) 4~. Ex", ig Oy, by, + 4... , 
i= 1,2,---,p;j = 1,2, --- , g, are distributed independently of SSE. It may 


be shown that any variable of the above three types is independent of any 


variable of the other two types by computing the appropriate covariances. We 
know that 


SST = qn > (w; — 0.)° 


where 
went (Bi +e&., Ew) =0, var(w) =o + 


cov (wy , wi) = 0, E(MST) = o° + now + qno? ; 


It follows that SST / E(MST) has a x’ distribution with p — 1 degrees of 
freedom. Similarly SSB is distributed as x’E(MSB) with q — 1 degrees of 
freedom. 

Consider the three sets of variables 


(7B):. — (7B)... (7B).5 — (7B)... Bag — (7B)s. — (7B). + (7B)... 


As with the ¢€;,’s, it may be shown that any variable of the above three types 
is independent of any variable of the other two types. Then it follows that the 
three sets of variables 


Ww — B., Wi - 9.5 Gy 8, 


are independently distributed and hence so are SST, SSB, and SS(TB). 
If, in the results for the Type I model, we set yu, the 7,’s and the 8,;’s equal to 
zero and assume the (78),;’s are NID (0, os), 


Yin = (7B)is + Gx, Yi;. = (rB)is + €ij. 
2 
E(¥y,) =0, var (Yu.) = om + —, 


and the Y,;.’s are independent. We carry out an analysis of variance on the 


Y,;.’s according to the model of Sec. 6, where there is no interaction, with the N 
of that section equal to pg and n;; = 1, to obtain 


P.a 
SSE, = SSE + SS(TB) = D0 (Ys. — ¥i.. — Vy. + ¥...)’ 
wd 
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since, under these conditions, the Y,, of that section is equal to Y;,;.. The 
theory of Sec. 6, when we replace o° by o73 + o°/n, tells us that SSE, is dis- 
tributed as x*(no7s + o”) and hence 


SS(TB) = S nlPos. -%.-¥..+F..)’ 


is distributed as x*(no2s + o*) with pg — p — g++ 1 = (p — 1)(q — 1) degrees 
of freedom. It then follows that the appropriate tests for H, , H, , H; are given 
by the statistics 


ya MS(TB) pr, - MST pr, < _MSB 
; MSE ’ *  MS(TB)’ *  MS(TB)’ 


respectively. A proof of these results is also outlined by Anderson and Bancroft 
{1}. 

The above results are for the case where n;; = n. If this condition does not 
hold, we can no longer say that F; and F; have the F distribution. This may be 
shown by considering the special case where p = 3, gq = 2, mu = ny = 1, 
Ney = Ne = 2, Mg, = Ng = 3 and N = 12. Then the moment generating function 
of SST is 


[1 — 4(11z + 30°)t/3 + 4(362” + 2220" + 30*)#/3)-” 


where x = a; + 07/2. This is not the moment generating function of a variable 
of the form cx’ unless x = 0. Thus there is no hope of F; having an F distribution 
and a similar argument holds for F; . 

For a Type III model with interaction we cannot expect to obtain the dis- 
tributions necessary for F tests of H; and H; since the (78);;’s are not normally 
distributed. An approach similar to the one given above could be used in the 
case of the mixed model. 


8. The two-way nested classifications. This model is discussed by Bennett 
and Franklin [3]. We assume that 


Y ists; = ' + Ti + Bicw + Ci jks j , 

q 
> ni.t6=0, DinsBu» = 0, 
t=1 j=l 
We test two hypotheses, 

Hy :B x = 0, t= 1,2,:-: 
and 
Heit; = 0, 

using the statistics, for the Type I model, 


MSB 


Pi = TSR 


(with p(q — 1) and N — pq degrees of freedom), 
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and 


Mot... 
F, = one (with p — 1 and N — pg degrees of freedom), 





where SST and SSE have the values given e“rlier and 





p.@ % > pa y? 2y3 
SSB = a nil Vay. — Y,.)* = aoe + } —, 
i,j ij Nj i=1 Ny. 


The Type II model is defined as before but, in the case of the Type III model, 
we assume that the 7,’s come from a finite population of size P, mean zero, and 


variance 
Ye 
e oa t=1 yuan 
""P-1 
while the 8 ;:,)’s come from P populations of size Q, corresponding to the different 


values of i, these populations being independent of each other and the popu- 
lation of 7,;’s, with zero means and common variance 


2 9 
> Bw 
2 j=l 
> Q oat 


ll 


q 72 
E(MST) = o? + —%#4 _ Pp hnZ | a3 








(p — 1)N* Lisi Q 
as Pp 
- ae o; + a 6.) > n.73, 
BS) p—1 im 
9 I oy a=» ) ef 9 ” 
E(MSB) = of + — 2289 gt U-OS, eesE = 0’, 
p(q — 1) P(q — 1) Fa 


where the 4’s have the meaning assigned in Sec. 5 and 


P q 2 
DL ni dn’; 
r i=l r j=l 
a=N- - . ee 
N N 
Examination of MSB indicates that, no matter what model we may use, we 
may test the hypothesis H, by the statistic F; given earlier in this section. For 


a Type II model with n,; = n, we test H2 with the statistic 
, _ MST 
~ MSB° 


For other cases an approximate method must be used such as is given elsewhere 
in the literature [7]. 
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SUMS OF INDEPENDENT TRUNCATED RANDOM VARIABLES! 


By J. M. SHaAprIrRo 


The Ohio State University 


1. Summary and introduction. Let (z,,), (k = 1, 2,---,ka;n = 1, 2, ---) 
be a double sequence of infinitesimal (i.e. lim,.. maxici<:, P{|an| > €} = 0 
for every « > 0) random variables such that for each n, r_; , --- , 2ne, are inde- 
pendent. Let S, = 2a: + +--+ + 2nx, and let F,(x) be the distribution function 
of S, . For any a > 0 let the random variables x4, be defined by 


(rm, if —a< 2m Sa, 
tne = \ ; 
\0, otherwise, 
and let F,(x) be the distribution function of S) = 24; + --- + xh, . In the next 
section certain necessary and sufficient conditions are given for F(x) to converge 
(n — ) to a limiting distribution and in particular it is shown that if F%(z) 
converges to F(x), then F(x) has finite moments of all orders. In Sec. 3 it is shown 
that if F(x) converges to F(x), then for each positive integer k the kth moment 
of F.(x) approaches the kth moment of F(z) asn — &. 
We shall call the random variables (z,,) a truncated system if there exists a 
b > 0 independent of k and n such that P{|\zu.| > b} = 0. We note that if we 
start with a truncated system we can choose a > 0 such that zi, = tu. 


2. Conditions for convergence. Since the random variables (z,,) are infini- 
tesimal and independent within each row, it is clear that the random variables 
(z%%) are also. From a well-known theorem of Khintchine, (c.f. [1]), it follows 
that for the weak convergence of F,(r) (or F4(x)) to a limiting distribution 
F(x), F(x) must be infinitely divisible. 

Let F(x) be any infinitely divisible distribution function and let g(t) be its 
characteristic function. According to the formulas of Levy and Khintchine [1] 
for the representation of the characteristic function of an infinitely divisible 
distribution we have 


2 * 2 
log o(t) = iyt + / (c —1 - ) [+/# dG(u) 


a + wv? u* 


iy(r)t — D°P/2 + [ (e“! — 1) dM(u) 


+ [ (et = 1) dN) + [ (e* — 1 = tut) Mw) 


+ _ (e — 1 — iut) dN(u), 


Received September 28, 1956. 


1 Presented to the American Mathematical Society on February 26, 1955 and December 
29, 1955. 
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where G(u) is bounded nondecreasing function (G(— «) = 0), y a real constant, 


M(u) = / I+! wy te aX 0, 


2 


ae 2 
N(u) = -| 1+? dG) for u> 0, 


(2.2) 
b’ = G(+0) — G(—0) and y(r) 


= T | 


“lul<r 


uaG(u) — [ 


lule 


: dG(u), 
rt 


and where 7 and —7 are continuity points of N(u) and M(u) respectively. 
Let F.4(x) and F%,(x) be the distribution functions of z,, and x4, respectively. 
From the definition of zi, we note 


(0, for x 
| F(x) — Fu(—a), for — 


Far(z) = § 


| Fas(2x) +1-— F(a), for 0 
1, for x 


The following theorem (c.f. [1], p. 124) will be needed. 

THEOREM 1. Jn order that the distribution functions of the sums S; = 
Ini t +++ + Ine, Of independent infinitesimal, random variables converge to the 
distribution function F(x), it is necessary and sufficient that: 


(1) At continuity points of M(u) and N(u) 
kn 
lim >> F(z) = M(z), for z <0, 


now kel 


kn 
lim >> (Fu(z) — 1) = N(z), for z> 0; 
ke} 


no-no 


lim lim Xf. a dF u(x) — (fis: r dFu(z)) } = 
tino lime Xf... 2 dFa(2) — o 2 dFu(z)) } = 0; 


€+0 noo kel 


kn 
(3) lim > | _, 2 aFale) = v(7), 


noo kel 


where M(u), N(u), 0’, and y(r) are given by (2.1) and (2.2). 

Now using the notation of (2.1) we have the following theorem. 

TueEoreM 2. If for some a > O F%(x) converges to F(x), then the function G(u) 
is nonincreasing for u > aand foru < —a. 

Proof. Since F%,(x) converges to F(x), according to Theorem 1 we know that 
at continuity points of M(u) and N(u) 
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kn 
lim >> F2.(z) = M(x) for x <0 and 
nwo kel 


(2.4) S 
lim au (Fau(z) — 1) = N(x) for x>O0. 


Thus from (2.3) and (2.4) since M(u) and N(u) are nondecreasing functions, 
we see that M(u) = 0 for u < —a and N(u) = O for u > a. Using (2.2) the 
conclusion of the theorem follows. 

Now given F(z) infinitely divisible define (using the notation of (2.1) and (2.2)) 
for any a > 0, +a continuity points of G(u), 


0, foru Ss 
(2.5) G*(u) G(u) — G(-—a), for —a 
G(a) — G(—a), foru = 
Y=1- 1 dG(u), 
jul>a U 


and let F’(x) be the (infinitely divisible) distribution given by (2.1) using the 
function G*(u) and the constant y*. We note that F’(zx) is also given by (2.1) 
using the function M*(u) and N°(u) defined by 


(u) (0, for —«» <u < —a, 
M°(u) = 
| atu) — M(-a), for-a su <0, 


(2.6) 
0, fora <u< ™, 


N*(u) = 


N(u) — N(a), for0 <u Sa, 
(with 6? unchanged) and 
(y(r) for ra, 


7. (2) for r> a. 


(With this notation we have the following theorem. 

THEOREM 3. If F(x) converges to F(x), then for any a > O (xa continuity 
points of G(u)) F',(x) converges to F*(x). In particular, if G(u) is nonincreasing 
outside of the interval [—a, a] then F%,(x) converges to F(x). 

Proof. Since F,(x) converges to F(x), parts (1), (2), and (3) of Theorem 1 
hold. We note that continuity points of M(u) and N(u) coincide with those of 
G(u) so that —a and a are continuity points of M(u) and N(u) respectively. 
From (2.3) and (2.6) it follows that at continuity points of M°(u) and N°(u) 


lim 3° Ftx(z) = M*(z) and lim Fy (F%,(z) — 1) = N%(2), 


n~eo kel n+ 


for x < 0 and z > O respectively. Also it is clear that part (2) of Theorem 1 





SUMS OF RANDOM VARIABLES 


holds with F,,(z) replaced by Fa,.(x) and that 


n»e kel 


> > |... 2 aFat@) = 7'(r). 


Thus from the sufficiency of Theorem 1 we see that lim,.. Fa(z) = F°(z) (at 
continuity points of F’(x)). We note that if G(u) is nonincreasing outside of 
[—a, a] then F*(x) = F(x). This proves Theorem 3. 

Combining Theorems 2 and 3 we can state the following theorem. 

TueoremM 4. If F,(x) converges to F(x) and if +a are continuity points of G(u), 
then a necessary and sufficient condition for F(x) to converge to F(x) is that G(u) = 
G(+ @) foru = aand G(u) = G(—@«~) = Oforu S —a. 

Tueorem 5. If F(x) converges to F(x), then F(x) has finite moments of all 
orders. 

Proof. By Theorem 2 we know that G(u) is nonincreasing outside of the inter- 
val [—a, a]. In particular it follows that f*.. 2" dG(z) < @ for all n. By the 
result of [2] it follows that F(z) has finite moments of all orders. 

We remark that if the system (z,,) is a truncated system we have the following 
analogues of Theorems 2 and 5. 

THEOREM 2a. If F,,(x) converges to F(x), then the function G(u) is nonincreasing 
for u > aand foru < —a. 


THeoreM 5a. If F,(x) converges to F(x), then F(x) has finite moments of all 
orders. 


3. Convergence of moments. In the remainder of this paper we shall assume 
that (z,4.) isa truncated system. If this is not the case, the following results apply 
to the system (z%,) previously discussed. 

In view of Theorem 5a it is natural to consider the question of the convergence 
of moments of the distribution function F(z) of the random variable S, to the 
moments of F(x). The principle result of this section is contained in the following 
theorem. 


THEOREM 6. If (x,x) is a truncated system, and if F(x) converges to F(x), then 
lim [ a* dF,(x) = [ a‘ dF(z), 


for every positive integer k. 

The author first proved this theorem in the special case where F(x) was the 
Poisson distribution (see Bull. Amer. Math. Soc., Vol. 61, Abstract No. 435) 
and where k = 2. This more general form was obtained at a later date (Bull. 
Amer. Math. Soc., Vol. 62, Abstract No. 264). 

The proof of Theorem 6 requires several lemmas which we state and prove 
below. 


Using the same notation as in section 2, according to the result of [2) we know 
that 


[ a dF(z) < = | x dG(z) < ~, 
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and assuming F(x) has finite moments of all orders that, 
(3.1) mu =yt+ [ udG(u) and x, = / (uw? + u’) dG(u), 


where x, is the rth semi-invariant of F(x). In particular letting » be the mean 
and o” the variance of F(x), we see 


(3.2) u=yY + | udG(u) and o = G(+o) + f: u’ dG(u). 


Lemma 1. Under the hypothesis of Theorem 6, lim,.« o°(S,) = 0°, where o (Sn) 
is the variance of S, and o° is the variance of F(x). 

Froof. Since F,(x) converges to F(x) by Theorem 1, page 112 of [1], we have 
G,(z) = Ken fru /(1 + uw’) dFu(u + a) — G(x) as n — = at all con- 
tinuity points of G(u) and also G,(+ ©) > G(+ ©), wherean = fjzjcr 0 dF (2), 
(+ > Oan arbitrary positive constant). (Remark. By hypothesis P{|z,.| > a} = 0 
for some a > 0. We may and do take r > aso that an, = ua = mean of 2% . 
Hence in the remainder of the proof we assume a, = ua.) Now since 
a/(1 + 2”) = 2” — [x*/(1 + 2’)], we see 


kn re) 
Ga(+o0) = do [2 dFu(e + ws) 
ken] /—oo 
(3.3) nae 
—¥ [2/0 + 2) dPale + um) + G(+2) as no. 
ken 0 


Also, 
2° « kn z 2 
2 a 2 Uu 
[tage = [ ead [po aralu t+ wa) 
(3.4) SF (x) 
= > | a*/(1 + x’) dF y(u + Mnk). 
kewl “—oo 


By Theorem 2a, G(x) is nonincreasing outside of some interval. Now since the 
random variables are infinitesimal it follows that lim,.. maxisis:, |@n| = 0. 
Thus since P{|z,.| > a} = 0 for some a > O, we know that there exists an 
A > 0 such that G(x) and G,(x) are nonincreasing forz < —A andz>A 
(n = 1, 2, --- ). Therefore by Helly’s convergence theorem 


(3.5) [ 2 dG,(x) = [ x‘ dG,(2) > . a dG(z) = if a* dG(z) 


asn— ©, Letting k = 2 and using (3.3) and (3.4) we see 


kn « wo 
tim D> [ 2? du(x + am) = G+) + [_ # aac. 


now kel 


Now ni, 2n2, °** » Unk, are for each nm independent random variables and since 
° . 2 2 . 
Onk = pink , by virtue of (3.2) we see lim,.. 0 (S,) = o. This proves Lemma 1. 
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Having obtained this result we can now prove that the means yu, of F,(z) 
approach the mean yu of F(z). 
Lemma 2. Under the hypothesis of Theorem 6, 


wn =f 2dr(a) > [ 2aP(z) =» as n— © 


(7.e., Theorem 6 holds for k = 1.) 

Proof. For the proof of this lemma we appeal to Theorem 2 of [1], page 100. 
Since the random variables (z,x) are infinitesimal and since maxi<i<x, \unx| = 
MAXi<i<k, |@ne| — 0 as n — © we see that the random variables (tu — unx) 
are also infinitesimal. This together with Lemma 1 shows that the hypothesis 


of Theorem 2, page 100 of [1] is satisfied and hence we may conclude in par- 
ticular that 


kn « 
w= >| ztdFy(z)—-7' as n> o, 
kewl © 


where 7’ is the constant associated with Kolmogorov’s formula for the charac- 
teristic function of the infinitely divisible distribution F(z). But the constant of 
Kolmogorov’s formula is the mean of the distribution (i.e. y’ = yw). This proves 
Lemma 2. 

Lemma 3. Under the hypothesis of Theorem 6 


kn 2 
a [ x dF (x + Unk) = De). 7°? 0 as nma-— ©, 
1 co : 


r = 2,3, --+ , where xen) = rth semi-invariant of S, . 
Proof. We note that 


kn o 
- [ SOS + pi) > een SO te ¥ SRS 
kewl © 


kn © 
> | zt dF u(x) — xray) = 0. 
ken —oo 


n= . (x — pn)” dF,(x) 


and let us? = [%. 2° dFu(z + unr). Now since (x,,) is a truncated system, and 
since Max;<i<e, | Hn |—> 0 asn— © we see 


r) A 
max | [ 2 dF u(t + une)| = max [ w dF + pnt) 
Isksk, | ~-@ isksk, |~-A 


for some A > 0. Now given 0 < « < 1 we see 
A 


max [ x dF (x + Hnk) s max [ |x|’ dF x(x + Hnk) 
bB |=a k € 
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+ max | |a|" dFax(z + unr) S € + A’ max P{|z— wu| = « 
esizis ‘ 


k 


and since (2, — nx) are infinitesimal we see 


(3.6) lim max [ 2 dF x(x + unr) | = 0,r = 2,3,-- 


noo Isksk" 


Also we see (for r = 2) 


kn «© 
% | use | = a | [ a dF u(x + punk) 


kn pA ke pa 
s > | | x | dF x(x + Hnk) s A’ >| x dF (xz + Hk) 
kewl SA kel +A 


A’*o*(S,) — A” eo’ asn — @ 


by Lemma 1. Hence 


(3.7) > | une 


is bounded in n for r = 2,3 --- . Let x-(Z) denote the rth semi-invariant of the 
random variable Z and let 4? denote the rth central moment of Z. For r > 3 
we note 


(3.8) xe(Z) = ws? + flu”, --+ , uf”), 


where f is a polynomial in as oie 7 be ’ each term of which is at least degree 


2 (cf. [1], page 66). Thus x-(zu) = uP + fluSr”, --- , uo). Now if X and Y 
are independent random variables we note ({1], page 64) x-(X + Y) = x-(X) + 


xr(Y). Hence since S, = 2a: + --- + 2x, is the sum of independent random 
variables we see 


kn 
(3.9) x(S,) = ew > use ‘a fur? a ps . 


The general term of the expression in braces may be written as 7 = 
e >i: [1.21 uw? where c is a constant, 2 < s; < r, p = 2 and where s; = 3; 
does not imply i = 7. But 


1T| S ec maxg| ws? | maxg | wh? | --- maxe | wsP-? | Doets | une” 


Thus by (3.6) and (3.7) we see that 7’ — 0 as n — o. Since the number of terms 
in f depends only on r this shows that the quantity in braces in (3.9) approaches 
zero as n — ©. This proves the Lemma. 

Proof of Theorem 6. We note 


Deets fre 2” dPu(t + um) = fr0(z™* + 2) dG,(2), 
where G,(z) = Does JZ. [u?/(1 + u*)] dFu(u + war) as defined in Lemma 1. 
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Now by (3.5) 


tim [ z dG,(z) = [ z* dG(z) k = 0,1,2,--- 
and therefore for r = 2, 
tim [ (* + 2°) dGu(z) = [ @* + 2) aG(a). 


n~2o 


But f°. (z”* + 2”) dG(z) is by (3.1) the rth semi-invariant of the infinitely 
divisible distribution F(z). Thus (for r > 1) 


(3.10) lim > . 2 dF u(x + une) = x = rth semi-invariant of F(z). 


now kel 
Using (3.10) and Lemma 3 we obtain 
(3.11) lim,.., Xr(n) = Xr; 


that is the rth semi-invariant of F,(z) approaches the rth semi-invariant of 
F(z) asn— ~, Let uw” = feo (rc — uw)” aP(z). By Lemmas 1 and 2 we have 
; Tae asn— @, Now x: = u° Xam) = US sxe =H e — 3(u aT. 

3(u2’)® and in general as indicated in (3.8) x = uu” 

u”), wheref is a polynomial and x-n) = wf? + ful”, «++, wo). 

Using (3.11) and an induction argument we see lima.e us” = uw”, (r = 2) and 


this together with Lemma 2 completes the proof of Theorem 6. 
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ON THE IDENTITY RELATIONSHIP FOR FRACTIONAL REPLICATES 
OF THE 2” SERIES' 


By R. C. Burton anp W. 8. Connor 
National Bureau of Standards 


1. Summary. The paper considers }4,; fractional replication designs of the 
factorial series with n factors each at two levels. The identity relationship for 
such designs is often written in terms of a symbol J and collections of letters 
which denote interactions among the factors. These collections may conven- 
iently be called “words,” and the number of letters in a collection, the “length” 
of the word. The problem considered is that of the existence of an identity rela- 
tionship which contains words of specified lengths. 

It is known that the words of an identity, together with the symbol J, form an 
Abelian group. The group contains sets of independent generators, and the 
products of such generators. Necessary and sufficient conditions are developed 
for the existence of an identity relationship for which the lengths of a set of 
independent generators and their products are specified. Further, it is shown how 
to construct such an identity relationship, and it is proved that the identity 
relationship is unique, apart from renaming the letters. 

For the more general case in which the lengths of the words are given—but 
are not associated with particular generators and products—a necessary con- 
dition is developed for the existence of the identity relationship. It is shown by 
example that this condition is not sufficient. 


2. Introduction. We shall consider the case of n factors each at two levels. 
Let the factors be denoted by A, B, --- and consider a 1%, fraction of the 2” 
factorial design (n > r). The identity relationship consists of the symbol J 
and 2’ — 1 words, connected by equality signs, as follows: 


IT = Ap rans Gl Sais = AB... , 
where m = 2° — 1; a;, b;, --+ (j = 1, --- , m) take on the values 0 or 1; and 


A® = B® = --- = 1. We shall consider the case in which all n letters are present 
in the identity relationship. 


The product of any two words, 


A**B* ... and AB’... , 


is A**tBs* ... | where a, + ay, be + by, -:- are reduced modulo 2; and 
the product of any word with J is the word itself. The words of the identity re- 


lationship, together with J, form an Abelian group, in which each word is its 
own inverse. 


Received September 4, 1956. 


1 Work performed (in part) under the sponsorship of the Bureau of Ships, Navy De- 
partment. 


762 





FRACTIONAL REPLICATES 763 


For a }4; fraction, the group contains r words which are independent genera- 
tors. The product of any 7 (i = 2, --- , r) of these generators is a word different 
from all of the generators. To illustrate, consider n = 5 and r = 3. An identity 
relationship is 


] = ABC = CDE = AE = ABDE = BCE = ACD = BD, 


where a set of independent generators has been underscored. , 

Before proceeding with our main argument, it is helpful to review some known 
necessary conditions. For this purpose we shall let w, , --- , w, denote the lengths 
of the words. Brownlee, Kelly, and Loraine [1] state that the following conditions 
are necessary: 

i) 2%, wy, = 2 n. 

(ii) Either the w’s all are even or 2”" of them are odd. 

(iii) When 2” of the w’s are odd, the words with even numbers of letters 
must, with the identity 7, form a subgroup of order 2”’. 

(iv) If some w = n, the remaining w’s must be divisible into pairs such that 
the total of each pair is n. 

(v) If some w = 1, the remaining numbers must be divisible into pairs such 
that the numbers in each pair differ by 1. 

They then state that “no other necessary conditions appear susceptible to 
expression in simple general terms.” 

These conditions do not require that the lengths of particular generators and 
their products be specified. The problem which we shall solve does make this 
requirement. We shall state necessary and sufficient conditions for the existence 
of an identity relationship, for which the lengths of the generators and their 
products are given. In addition, we shall prove that such an identity relationship 
is unique, apart from the naming of letters; and we shall show how to construct 
the identity relationship. 

For the more general case considered by Brownlee, Kelly, and Loraine, we 
shall derive an additional necessary condition, and shall show by example that 
it, together with the above conditions (i), (ii), (iii), is not sufficient to imply the 
existence of an identity relationship. 

In addition to [1], some papers which treat fractional replicates and the re- 
lated problem of block confounding are listed as references [2], --- , [7]. 


3. The existence of an identity relationship with generators and products of 
specified lengths. Let 7; , 72, eee , t, be s integers such that 0 x a4 < 12 eee 
i, < r+ 1. The ith generator will be denoted by W(t) and the product of the 
i,-th i-th, --- and i,-th generator by W(i,, i2, --~ , 7). There are exactly 2° — I 
words corresponding to the 2” — 1 symbols (i; , t2 , --- , t.). The numbers of let- 
ters in the word W(i, - -- z,) will be denoted by w = w(; --- 7,). 

We shall find it convenient to introduce a symbol S to denote the entire collec- 
tion of 2” — 1 symbols (7; --- i,), a symbol O = O(i, --- i,) to denote the collec- 
tion of symbols which contain an odd number of indices from (i --- 7,), and a 
symbol E = E(i, --- 7,) to denote the collection of symbols which contain none 
or aneven number of indices from (4; --- 4). 
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Let n(O) and n(£) denote the numbers of symbols in O and E£. It is readily 
shown that 


n(O) = 2", 
n(E) = 2°* — 1. 


(3.1) 


We note that the distribution of a letter among the words of the identity is 
determined by its distribution among the generators. Suppose that a letter occurs 
in the s generators W(i,), --- , W(z,), but not in the remaining generators. Then 
it occurs in all products which have an odd number of indices from among 
i;,°*+, %. But there are 2” — 1 ways in which a letter may be distributed a- 
mong the generators, and hence among the words of the identity. 

We shall use the symbol ¢ = ¢(i, --- 7,) to denote the number of letters which 
occur in all of the s generators W(i;), --- , W(i,) but not in the remaining gen- 
erators. 

For example, consider three generators W(1) = BCD, W(2) = ACDEF, 
W(3) = ACF. We may illustrate with a Venn diagram which letters are common, 
denoting the generators by the interiors of the circles in Fig. 1. 


Since the ¢’s are the numbers of letters in the basic disjoint sets, it is obvious 
that 


(3.2) Dos tli; «++ ip) = 0. 


It is also clear that any set of ¢’s which are positive integers or zero and satisfy 
(3.2) corresponds to a constructible identity relationship involving n letters. 

We shall now show explicitly how the ¢’s uniquely determine the w’s, and con- 
versely the w’s uniquely determine the ?’s. 

From the definitions of ¢, w, and O, it follows that 
(3.3) dD tifa +++ jp) = wlis «++ %,). 


O(y-++8,) 


SN J ye 
(2) = 1 


(3) = 0 
(12) = 1 
t(13) = 0 
(23) = 2 

(123) = 1 
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There are 2° — 1 such equations. These equations uniquely determine the w’s 
from the ?’s. 


Let us now introduce a dummy variable ¢(0), which we define to be identically 
zero. We may add ¢(0) to the left-hand side of (3.2) to obtain 


(3.4) (0) + Dost(jr-+- jp) = n. 

Multiplying (3.3) by 2, and subtracting (3.4) from the product, we obtain 
(3.5) — 0) + Diotlir--- 5s) — Lie thr-+- jr) = Qwlis --- a.) — 2, 
where 


O = O(4 «++ %) and E = E(u --- 4). 


We observe that the matrix of coefficients in (3.4) and (3.5) is a Hadamard 
matrix of order 2”. Dividing these equations by 2”", the matrix of coefficients 
becomes an orthogonal matrix. 

It is helpful to express the equations in matrix notation. Let ¢t denote the 
column vector [#(0), t(1), --- , (12 --- r)], 2 the column vector 2~”” [n, 2w(1) — 
n, +--+, 2w(12---r) — nj, and C the matrix of coefficients in (3.4) and (3.5) 
after division by 2””. In this notation the equations may be written as 


(3.6) Ct = z. 
Because C is orthogonal, C™’ = C’ and 
co 
(3.7) t 
The typical equation in (3.7) is 
(3.8) Dio wir -++ jin) — Lew wlhr +> jr) = 2 "(hr ++ 4). 


These last equations uniquely determine the ¢’s from the w’s. If the w’s deter- 
mine t’s which are positive integers or zero, and are such that their sum is n, 
then the identity relationship can be constructed, and is unique, except for the 
renaming of letters. 

We may sum up in the following theorem. 

THEOREM 1. For a 4’ fractional replicate of a 2” factorial design, let the numbers 
of letters in the words of the identity relationship be w(1), w(2), --- , w(r); w(12), 
w(13),---, w(r — 1, r);--- ; w(l2--- 1), where (1), (2),---, (r) refer toa 
set of independent generators and (12), (13),--- (r — 1, r); «++; (12---r) 
to their products. With respect to any one of these 2”* symbols (i; --+ t,), the re- 
maining symbols divide into a collection O = O(i; --- i.) of 2" * symbols which 
have an odd number of indices in common with the given symbol, and a collection 
E = E(i, --+ i.) of 2”* — 1 symbols which have none or an even number of indices 
in common with the given symbol. If the w’s satisfy the 2” — 1 equations 


doo Wj +++ jp) — Dow Wj «+ jp) = 2 H(t +++ %,), 





766 R. C. BURTON AND W. S. CONNOR 


where >-ot + Diet = nin the sense of implying t’s which are positive integers 
or zero, then the identity relationship exists. Conversely, if the identity relationships 
exists, then the equations are satisfied. Furthermore, knowledge of the t’s is sufficient 
to construct the identity, and the identity corresponding to a set of t’s is unique, apart 
from renaming the letters. 


4. A necessary condition. We now return to the more general problem of the 
existence of an identity relationship having words of given length, but without 
the specification of the lengths of particular generators and their products. 
We shall develop a necessary condition for the identity relationship to exist. 

From (3.6) we have 


(4.1) 'C’Ct = z’z, 
and because C’ = C™, 
tt = 2’z, 
> oe a o> f+ n?), 

Thus, there must exist n or fewer positive integers—the non-zero t’s—which 
add to n and satisfy (4.2). 

We may consider the special case in which the variance of the w’s is a mini- 
mum. The variance of w is defined by 

Vw) = (Low* — (dow) / 2 — 1) / 2 - 1) 
= 2 7[(2” — 1) 0? — n)]/ (2 — 1). 


(4.2) 


(4.3) 


Now V(w) is a minimum when >>? is a minimum—i.e., when >> ¢ = n. In this 
case (4.2) becomes 


(4.4) Yow = 2 n(n + 1). 


We may sum up in a theorem and corollary. 

TueEoreM 2. For a 4: fractional replicate of a2” factorial design, let the numbers 
of letters in the words of the identity relationship be w, , «++ , Wm, wherem = 2" — 1. 
Then a necessary condition for the identity relationship to exist is that there exisi n 
or fewer positive integers whose sum is n and whose squares add to2~** >> w* — n’. 

Corotuary. If the variance of w is a minimum, then it is necessary that 
dw = 2 ?*n(n + 1). 

We shall show by example that this condition, together with conditions (i), 
(ii), (iii) are not sufficient conditions. Conditions (iv) and (v) do not apply 
in the example. 

Let n = 9 and r = 4. Let the distribution of w’s be as follows: 


Frequency 
7 
6 
2 
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For this distribution, > w = 72 and >> w* = 360. This distribution has minimum 
variance, and satisfies the corollary. In addition, it satisfies (i), (ii), and (iii). 
To see that it satisfies (iii), we have only to write the following identity: 


I = ABCD = ABEF = ACEG = CDEF = BDEG = BCFG = ADFG. 


We shall show that the necessary conditions of Theorem 1 are not satisfied. 
To do this, we shall make a unique identification of the w’s with the generators 
and their products. ; 

Let us choose w(1) = 7 and w(2) = 7. Then because two odd lengths imply 
an even length, w(12) must be 4. We choose w(3) = 5, which implies that 
w(13) = w(23) = 4, and w(123) = 5. Finally, we choose w(4) = 5, which com- 
pletely determines the remaining products. The w’s are as follows: 


w(l) = w(12) = 4 w(123) = w(1234) = 4 
w(2) = w(13) 4 w(124) 
w(3) w(14) = 4 w(134) 
w(4) w(23) = 4 w(234) 
4 
4 


ocgs] sJ 
oo ool 


w(24) 
w(34) = 


~w- Dw=4 


0 (1) E(1) 


which by (3.8) implies that ¢(1) = 4. Accordingly, the identity relationship 


does not exist. 
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NOTES 
WAITING TIMES WHEN QUEUES ARE IN TANDEM 


By Epaar REIcH 
University of Minnesota 


1. We study the distribution of waiting times when customers proceed to a 
second (multiple-counter) queue after having been processed at a first (multiple- 
counter) queue'. For reasons of expediency we restrict ourselves to the case of 
unsaturated queues in “equilibrium,” that is, to stationary statistics. The main 
results are for the case of exponential service time, where it turns out that, 
contrary to a-priori intuition, the situation is surprisingly simple. As shown by 
Theorem 6, no such simple behavior can be expected when the service time 
distributions are even only slightly more general. Theorem 4 was first found 
essentially by P. J. Burke [1], by a different method.” 

The concept of reversibility of a Markov chain, certain aspects of which are 
discussed in Sec. 2, has turned out to be fruitful in connection with the analysis, 
and is of some independent interest. 


2. A stationary stochastic process N(t) is said to be reversible if N(t) and 
N(-—t) have the same multivariate distributions. If N(¢) is a discrete or continuous 
parameter Markov chain with a denumerable state space, say, 0, 1, 2,---, 
then N(—?) is a process of the same type. The necessary and sufficient condition 
for reversibility becomes 


(1) 6;;(t) = piP ,;(t) - psP;i(t) — 6;:(t), i, j = 0, 1, 2, te 


where p; and P;;(t) are respectively, the stationary, and transition probabilities 
of N(t). 

Kolmogorov’s criterion for reversibility of Markov chains with a finite state 
space ((8]; [5], p. 66) may, in a special case, be immediately generalized to the 
denumerable state-space case, as follows. 

THEOREM 1. Let N(k), k = 0, +1, +2, --- , be an irreducible stationary discrete- 
parameter Markov chain with the state space0, 1, 2, --- , the stationary probabilities 
u, , and the singlestep transition probabitities r,;. A necessary and sufficient con- 
dition for the reversibility of N(k) is that 


(2) WiyigWigizg °° * WineyinMiniy = Wiping Wining, °° * WigigMigiy 


for every sequence of non-negative integers (i; , t2, «++ , in, t11) beginning and ending 
with the same integer. 


Received July 15, 1956; revised November 27, 1956. 

1A part of this paper represents work done at The RAND Corporation. 

2 A special case of a part of this theorem was also treated (unpublished) by H. H. Goode 
and R. E. Machol. Their work is to appear in a text. 
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Proof. According to (1), fort = 1, 


(3) 9, igDigis * °° 9i,-1 in init - 9 is inDinin—t re 9: 5:2 9vi1 
is necessary. Since u; > 0, we may cancel the u,; , obtaining (2). Summing both 
sides of (2) over 73, %4,-°*:, in, we find 

Kin ig®igs,(% — 1) = wiyag(m — 1) wigs, . 


If + is the period of the chain, then the lim sup,.,, of the left and right sides 
ATE TU, %i;ig ANA TU; Tigi, , Tespectively ([4], p. 331). Hence ui,7,,i. = UigMies, - 
Eq. (1) follows by induction. 

Let us call a finite sequence of non-negative integers, beginning and ending 
in the same integer a cycle if no proper portion begins and ends in the same 
integer. Evidently (2) need hold only for cycles. 

Consider a continuous-parameter, time-homogeneous Markov chain N(t), 
with P,(h) = 6;; + ALi; + ofh), 


vba = 0, Li < 0, 1, 2, en 
j=0 
A process of this type will be said to be of type A if, in addition, 
(i) N(t) has stationary probabilities p;, >>; p; = 1, >, pili; = 0; 
(ii) N(t) has at most a finite number of discontinuities in every finite interval; 
—T no Dili = ay, 
(iii) the associated discrete-parameter chain N* defined by 
Ce (65; — LIL; / Li, i,j = 0, 1,2,---, 


is irreducible. (Note: This is a chain, of period = 2, resulting from a shift 
of the instants at which N(t) changes state to the instants t = 
-—1, 0,1, -++) 
THEOREM 2. A necessary and sufficient condition for a continuous-parameter 
Markov process of type A to be reversible is that 


(4) Liz ighiigis Ot Lig valent; - Lis igliininns 7 Ligighigi, 


for every cycle. 
Proof. Since the matrix P;;(t) of a process of type A is uniquely determined 
by its values for infinitesimal ¢, condition (1) is equivalent to 


(5) Dili; = pili. 
Note that 
© a _ 
r= > pulic) > 0, 
t—0 
in view of the second part of assumption (ii), above. 


u, = —rpiLi, +=0,1,--- 
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can be assigned as stationary probabilities to N*. Then a necessary and sufficient 
condition for N* to be reversible is 


unig = —rpi(biy — lILiy = —rp (by — DLE, 


which is equivalent to (5). The theorem follows from the fact that (4) is equiva- 
lent to (2). 

Let B(t) denote a stationary birth-death process with state space 0, 1, 2, --- 
the stationary probabilities p;, and 


Py ias(h) = AA + Ofh), A; > 0, 
Py i-th) - ash + o(h), Mi > 0, 
Py(h) = 1 — AR — ph t Oh), 


B(t) is permitted to have at most a finite number of increases (births), and de- 
creases (deaths) in any finite time interval, 7 pits + wi) < @. 

THEOREM 3. B(t) is reversible. 

Proof. If (4) is to be other than of the form 0 = 0, the cycle must be of length 
3, in which case (4) still holds trivially’. 

Corouuary. Jf \n = A, (n = 0, 1, --- ) then the death times of B(t) form a 
Poisson process of density X. 

Outline of proof. Since X, is constant the birth times are Poisson with density 
r. The stochastic process B,(t) = B(—t) is statistically identical with the process 
B(t). But if B(t) is a fixed realization, and B,(t) = B(—t) then the births of B(t) 
become the deaths of B,(t). 


3. Consider an unsaturated queue of type M/M/s (Poisson input, s counters, 
exponential service time, first come, first served), in equilibrium. If n(t) is the 
sum of the number of customers on queue, plus those being served, then n(t) 
is a process of type B(t), in which customers’ arrivals correspond to births, and 
departures to deaths. By considering the reversibility of n(t), guaranteed by the 
corollary to Theorem 3, the following is now clear: 

THEOREM 4. (a) The sequence of departure times form a Poisson process. (b) 
The value of n(t) is independent of all past departure times. (c) If to is a departure 
time, then n(t) + 0) is independent of all past departure times. 

Note. The above results are, of course, true for more general queue disciplines. 
The number of servers, instead of being fixed, may be permitted to vary as a 
specified function of the number of customers present. Also, instead of “first 
come, first served,” e.g., random service, or “last come, first served,” will do 
without effect on the results. 

Suppose the customers, after departing from a first queue of type M/M/s, 


* Heuristic forms of the necessary argument date to P. and T. Ehrenfest [3]. The above 
proof is in the spirit of the Ehrenfests’ reasoning. Simple algebraic verifications are also 
possible, but they leave the situation less lucid. The condition 2p;(A; + wi) < © is actually 
superfluous. 
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enter a second multiple-counter queue, where they are served first come, first 
served, with exponential service time. Such a combination of two tandem queues 
will be referred to as a o-system. It follows, from Theorem 4b, that if n;(t), ne(t) 
refer, respectively, to the first and second queues of a o-system, then m,(¢) and 
n(r) are independent, 7 < ¢t. This was first proved in the special case s = 1, 
t = r, by Jackson [6]. 

In what follows, the term waiting time will be used to refer to the time elapsed 
between a customer’s arrival and departure, the service time included. Let 7; 
and T, denote a customer’s waiting time at the first and second queues of a 
o-system, respectively. 

TuHeEoreM 5. Jf s = 1, then T; and T; are independent. 

Proof. Let n, be the number of customers at the first queue the instant after a 
customer C departs, and let nz be the number of customers C finds at the second 
queue (customers being served included). As a corollary of Theorem 4c, m and 
NM are independent. Let 


A(t; k) = Pr{T, < t| mp = k}. 


If \ is the number of customers arriving per unit time, then n, is the number of 
Poisson events of density \ that occurred during the waiting period 7, . We have 


Prim = j| 7, = t,m = k} = eas)? /j. 
Therefore 


Eis" |n, = k} = [ ee GA(t; k). 
0 


Now the left side is independent of k. Therefore A(t; k) does not depend on k. 
Hence n, , and consequently also T, , are independent of 7; . 


4. We will now consider the queues of type £; / E, / s (interarrival and serv- 
ice periods normalized chi-square with 27 and 2k degrees of freedom, respec- 
tively [7]), and show that Theorem 4a cannot be generalized further, in a certain 
direction. Note that both whenj = k = l,andj = k = -, the departure epochs 
of an EF; / E; / s queue are again FE, . (The casej = k = © corresponds to a peri- 
odic input with constant service time.) One may therefore ask* whether this state 
of affairs holds whenever 7 = k. However, Theorem 6, below, shows this to be 
false. 

TuEoreM 6. The departure epochs of an E, / E / 1 process are not an E; process. 

Proof. For the case under consideration we have x = interarrival period = 
2X; + 2X2, where 2,7 = 1, 2, are independent, with 

A 


E\{e~**} a A> 0, t= 1,2. 


‘This question is related to the asymptotic behavior of a large number of queues in 
tandem, each with E;-type service time, k fixed. 
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Similarly, the service periods, y, are of the form y = y; + ye, where y;,7 = 1, 2, 
are independent, and 
E —8Yi om BK 0 Pa os nN 1, = 1, 2. 
fe } a+ 3° ia /u < ¢ 

At a given instant the entrance will be said to be in state 1 (2) if the system is 
in the portion 2;(z2) of an interarrival period. Consider instants just following 
a departure. Let Aj = Pr{extrance is in state 7, and there are 0 customers left 
behind}, 7 = 1, 2. Let + be the length of an interdeparture period. If the de- 


parture epochs formed an E, process it would follow that 7 had the same marginal 
distribution as z, that is, 


Bie} = ul s3) (F5) + 49635) 5) 


Multiplying both sides by (A + s)*(u + 8)’, and equating coefficients of s’, we have 
Py, = Pr{0 customers are left behind by a departing customer} 
= Aw + Aw = 1— pf. 


However this is incorrect, as it differs from Volberg’s [9] formula for Po . Thus 
we have a contradiction. 

A related question is that of the possibility of imbedding n(t) in a reversible 
Markov process, e.g., for s = 1. To this end we define the ‘“‘pseudostate” 7i(t) of 
an E; / E, / 1 queue. We shall say that a(t) = r,r = 0,1, 2,--- ,7 — 1, if the 
(r + 1)st stage of the interarrival period is in progress. Similarly, put b(t) = r, 
r=0,1,---,k — 1, if the (r + 1)st stage of the service period is in progress; 
if the counter is empty, b(t) = 0. Define 


6) a(t) = v(t) + HO — 2, 

j k 
The realizations of the process 7i(t) are constant except for jumps of height 
1/j, upward, and jumps of height 1/k, downward. We make the following ob- 
servation. 

TuEoreM 7°. If j and k are relatively prime, fi(t) is a Markov process. 

Proof. If the hypothesis is satisfied, n(t), a(t), b(t) can be recovered from 
a knowledge of fi(t). 

We conclude that if j and k are relatively prime, and 7 = k, then 7(¢) is re- 
versible. Since 7 = k = 1 is the only admissible possibility, the special nature of 
the E, / E, / 1 queue is seen in a new light. 

A straightforward computation shows that the following partial converse of 
Theorem 4a holds. 


5 This fact enables one to study the transient behavior of n(t) for E;/E;,/1, j, k relatively 
prime. We shall not explore this further at this time, however. 
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Tueorem 8. If the arrival and departure epochs of a single-counter queue are both 
Poisson, then the service time distribution is exponential, or a step function at 0. 

The author has had valuable discussions with A. W. Marshall and T. E. Harris 
in connection with this work. 
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ON THE POWER OF OPTIMUM TOLERANCE REGIONS WHEN 
SAMPLING FROM NORMAL DISTRIBUTIONS' 


By Irwin GuTTMaNn 
University of Alberta 


1. Introduction and Summary. In [1], optimum f§-expectation tolerance regions 
were found by reducing the problem to that of solving an equivalent hypothesis 
testing problem. The regions produced when sampling from a k-variate normal 
distribution were found to be of similar 8-expectation and optimum in the sense 
of minimax and most stringency. It is the purpose of this paper to discuss the 
“Power” or “Merit” of such regions, when sampling from the k-variate normal 
distribution. 

Let X = (X,,---, X,) be a random sample point in n dimensions, where 
each X; is an independent observation, distributed by N(u, o”). It is often de- 
sirable to estimate on the basis of such a sample point a region which contains 
a given fraction 8 of the parent distribution. We usually seek to estimate the 


center 1008 % of the parent distribution and/or the 1008 % left-hand tail of the 
parent distribution. 


Received April 20, 1956; revised December 17, 1956. 
1 Research supported by the University of Alberta General Research Fund. 
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2. Formulation of the Power Function. Suppose sampling from N (u, o”?), where 
Case I: yu, o° unknown. For this case the solution of the equivalent hypothesis 
testing problem (as formulated on p. 171 of [1]) is given by 


dy(%1,°**,%n) = 1 if |W| S as, 
=0 if |W] >a, 


=n" x xX, s = (n — 1)” 2 (x; — #)*, 
and ¢,(71, --* , Zn) is the characteristic function of the minimax most stringent 
tolerance region S(X,, --- , X,). The ag are constants chosen to give 


S(t, °+* , Un) - [Z — agsz , EZ + agsz] 


size 8, and are tabulated in Table I of [1]. 
The power of ¢ (as defined on p. 170 of [1]) and hence of S is determined by 


the distribution of W under the alternative of the equivalent hypothesis testing 
problem. That is, we have 


Power = Pan.(|W| S as), 


- [Wye ag ) 
Fal ( +n) ~ (2 + n7)/° 

Now, under the alternative, y — # has variance (a° + n)o’. Thus, 
W / (a+ n) under the alternative, is the Student’s “‘T”’ variable with (n — 1) 
degrees of freedom. Hence 


ry. ag 
(2.2) Power = P ( T | - a) e 


The power measures the ‘“‘degree of confidence’ we have in S(X,, --- , Xa) 
of covering the centre 1008% of N(u, o’), when the “desirability” of covering 
the centre 1008% set is given by 


(2.1) 


Qu.0(S) = [ anu, a’o’), 0<a <1. 


For example, if it is 99.1% desirable to cover the 95% center part of N(u, a’), 
then a = #, and the power is found by (2.2) using a = %. Values of the power 
for the regions S, where the desirability of the 1008 % sets are .99, are given for 
B = .75, .90, .95 and .975, in Table I. 
As an example, consider forming S on the basis of a sample of 7. Then from the 
tables in [1], 
S = [@ — 2.616 s,, + 2.616 s,]. 


Now, suppose we wish to have 99% confidence that S(z,, --- , 27) contains 
95% of N(u, a’). Then, the “confidence” that S covers 95% of the parent dis- 
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TaBie I 


Power of B-expectation tolerance regions, 
[z — Gg8z, Zz + ag8z| 


Measure of Desirability = .99 





-870167 - 760906 


-95 





-9545 
-9618 
-9682 
-9733 
-9772 
-9809 
-9847 
- 9863 
-9872 
-9881 
-9890 











tribution, that is the power of S, is found by entering Table I for n = 7 in the 
8 = .95 column. The power is found to be .9772. That is, if X,,---, X7, Y 
are independent normally distributed chance variables, X,,---, Xz having 
identical distributions with mean » and standard deviation o, Y having mean 
» and standard deviation ac, then 


95 ifa = 1 
.9772 if a = .760906. 


Pr(X — 2.616 8, < Y < X + 2.616 8,) = | 


Case Il. Mean unknown, variance known. The minimax and most stringent 
tolerance region S(X,, --+ , X,) of similar 8-expectation is given by 


S(t,-*+,%) = [Z — bso , + gol, 


where o’ is the known value of the variance, and bg are constants chosen to give S 
size 8. Using the same procedure as for Case I, we have 


(2.3) Power = P(\zis Say aes): 


where Z is the standard normal variate. Values of the power for this case are 
given in Table II. 


Case III. Mean known, variance unknown. The minimax and most stringent 
tolerance region S(X,,---, X,) of similar 6-expectation is 


S(a ." we Zn) = [u — ta-pr 8. » + tape 82], 
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TABLE i 


Power of 8-expectation tolerance regions, 
[= — bgo, E + bgo] 





Measure of Desirability = .99 





.870167 . 760906 | 








- 9887 
-9891 
-9893 
-9896 
-9898 





TABLE III 


Power of B-expectation tolerance regions, 


[u — ta_pyo8’e, w+ tapyes’e], where s’2 = Vn‘ Di(x, — wu)’ 


Measure of Desirability = .99 


.870167 . 760906 | .638572 


-975 -95 -90 


9765 -9578 - 9280 -8651 
-9787 -9678 -9535 -9243 
-9806 -9751 - 9626 -9501 
-9821 -9771 -9695 -9581 
- 9832 -9787 9747 -9644 
-9847 -9811 -9779 -9733 
- 9863 -9838 -9814 -9787 
-9879 -9865 -9851 -9835 
-9889 -9881 -9873 -9864 
-9892 -9887 -9882 -9875 
120 -9896 -9893 -9891 - 9887 








where yu is the known value of the mean, f, is the point exceeded with probability 
a by the Student’s “7” variable with n degrees of freedom. Using a similar 
procedure as above, the power of these regions are clearly given by 


Power = P(| T | S ta—s2/a), 
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where 7’ is the Student’s variable with n degrees of freedom. Values of this power 
are given in Table III. s; is defined as ~/, >? (a: — nw)’. 


3. Sampling from a k-variate normal distribution. Consider the case of sam- 
pling from a multivariate normal distribution 


c exp[—43(w — p)A(w — pz)’), 


where » = (u,,°°:, wx) and A” is the variance covariance matrix of w = 
(X,, --+ , X,). Suppose » and A™ are unknown, and suppose it is desired to form 
regions S which cover the centre part of the parent distribution. The choice of 
the measure of desirability is 


(3.1) Q,a-1 = N(u, oA”), 0<a<l. 


It was shown in [1] that the solution of forming these regions, that is, of an 
equivalent hypothesis testing problem (see p. 176 of [1]) is 


dy(or, ***,@s) = 1 when (€ — &)A“(E — w)' Sop, 
= 0 when (§ — &)A“(E — &)' > &, 


(3.2) 
where @ = n> 2.1 We, 


A = (n-—1)7 D (we — &)"(wa — 8), 
awl 
and cs are constants chosen to give the ellipsoidal region 
(3.3) S(wi, +++, @n) = {E| E — a)A“E — &)’ S cp} 


size 8, that is, 8-expectation. These regions were found to be minimax and most 
stringent. Letting y’ = ( — a)A~*(é — &)’, the power clearly takes the form 


2 
Power = Pan.{y Sc} = or = ee 


e+ n- 
(3.4) 


2 Ca 
Pts ta}. 
where T” is Hotelling’s T° variable with (n — 1) degrees of freedom, and Alt. 


refers to the Alternative hypothesis in the formulation of p. 176 of [1]. 
By making the transformation 


k 
T= (9-1) F, 


it is well known that Hotelling’s 7° distribution goes into Fisher’s F distribution 
with k, n — k degrees of freedom. That is, the power of S is given by 


i n—k Cs ) 
Power = P(F s Kn =) @+3)/° 
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TABLE IV 


Power of B-expectation tolerance regions, 
(§ — a)A“(E — &)' Sop 


Measure of Desirability = .99 


.79697 | / . $3403 
95 . 75 


.9531 y -7810 
.9598 F .8502 
.9661 j .9022 
.9752 ‘ .9291 
.9794 i .9578 
9845 ; .9770 
9865 j .9812 
| .9866 ‘ 9815 
: | .9868 .9818 














In [1], it was shown that cs = (1 + n™")-(n — 1)-(k/n — k)-Fi-g, where 
F_ is the point exceeded with probability 1 — 8 using the F distribution with 
k, n — k degrees of freedom. Hence the regions (3.3) have power given by 


1+n” 
a? os n- 
Values of the power function (3.5) are given for the case of sampling from the 
bi-variate normal distribution (k = 2), when the correlation coefficient p is zero, 
and desirability of the centre 1008 % sets is .99, in Table IV. 
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THE CONVERGENCE OF CERTAIN FUNCTIONS OF 
SAMPLE SPACINGS! 


By LIoneEL WEIsS 


Cornell University 


1. Introduction and summary. Suppose g(u: ,--- , uz) is a continuous function 
of its arguments, homogeneous of order r, monotonic nondecreasing in each cf its 
Received July 23, 1956. 


1 Research under contract with the Office of Naval Research. It may be reproduced in 
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arguments, which is positive whenever each of its arguments is positive, and is 
such that for any given K(0 < K < «), there isa number R(K)(0 < R(K) < ~) 
such that g(m,-:-, uw) < K and w 2 0O,---, m 2 O imply that 
Wm +:+- +u < R(K). 

Let U,,-+--, Us be chance variables with joint density e“'*"**” for 
u, = O,---, uw 2 O, and zero elsewhere. For any ¢, define U(t) as 
Pig(U, , --- , Ux) &S t). We note that U(¢) is a continuous distribution function, 
with U(0) = 0. 

Let p(v) be a bounded nonnegative function with a finite number of discon- 
tinuities, defined for 0 S v S 1. Suppose X,, X2, --- , X, are independently 
and identically distributed chance variables, each with density f(z), f(x) being 
bounded, and having a finite number of discontinuities and oscillations. F(x) 
denotes f*. f(z) dz. Define Y; S Y2 S--- S Y, as the ordered values of 
X1i,°++:,X,, and define T; as Yin. — Yi = 1, --- ,n — 1). Let R,(t) denote 
the proportion of the values 


(2) o(ts, ++, 7), p (2) a(t, «++, Tends, 


p (” — 4 AT as, °° Ted) 


which are less than or equal to t/n’. 
Let U[[tf"(x)] / {[F(x)]}] be defined as follows. If f(x) = 0, 


Olly" (x)] / {olF(z)]}] = 0 


regardless of the value of ¢. If z is such that f(z) > 0 and p[F(z)] = 0, then 
U[[tf' (x)] / {o[F(x)]}] = 1 regardless of the value of t. If f(z) > 0 and p[F(z)] > 0, 
then U[(tf"(x)} / {elF(x)]}] = Ully"(z)] / {elF(2)]}). Let S(t) denote 


= 2 Ollt-f'(z)) / {olF(z)]} f(a) dz, 


and let V(n) denote sup;z.|R,(t) — S()|. Then V(n) converges to zero sto- 
chastically as n increases. This generalizes the result of [1], where kK = 1, 
g(ui) = u, p(v) = 1. The present result may be used to construct tests of fit 
in the presence of unknown location and scale parameters. 


2. Proof of the convergence of V(n). 

Lemma 1. Jf for each given positive t, R,(t) converges to S(t) stochastically as n 
increases, then V(n) converges to zero stochastically as n increases. 

Proof. S(t) is continuous for all t > 0, and is continuous on the right at ¢ = 0. 
S(O+) = JSotrcj_of(z) dz. It is easily seen that R,(0) converges to 
J trce)= 0 f(x) dx with probability one as n increases. The rest of the proof of the 
lemma is almost exactly the same as the proof of Lemma 1 of [1]. 

Lemma 2. Let X,, X2,---, Xn be independent chance variables, each with a 
uniform distribution on [0, 1]. Let M denote the number of these variables falling 
in the closed interval [C, D|, wherreO0 SC < DS l,andlettY, 5S Y¥25-:--S Vu 
denote the ordered values of the variables in [C, D]. Define Wi = Y2 — ¥i1,--- 


’ 
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Wu = Yu — Yu-s. For a given positive t, define L(n, t) as the total number of 
values of g(W1, +--+, Wx), g(We, +++, Wear), +++, 9(Waee, +++ , War) which 
are not greater than t/n". Then [L(n, t)] / n converges to (D — C)U(t) stochastically 
as n increases. 

Proof. Define Z; to be one if g(W;, --+ , Wi-+4:) S t/n’, and zero otherwise. 
M/n converges to (D — C) with probability one as n increases. The condi- 
tional distribution given M of Q; = MW;, , 


Q. = MWi,,--:,Q = MW;,(1 Si < te < 72 < iz <= M-— 1) 


is easily verified to be 


[p-c- Mt tall”. M! 

M M\(D — C)*(M — L)! 

for qi + --- +q: S M(D — C), and zero elsewhere. As M increases, this density 
approaches [1/(D — C)”]exp {— (q + --: + qz)/(D — C)} uniformly in any 
region where qi + --- + q. < K < . We note that under this limiting density, 
Q:, «+: , Qz are independent. To say that 9(W;, --- , We-+4:) S t/n’ is the same 
as saying that g((n/M)MW,,---, (n/ M)MW,-1,;) S t, and as n increases 
the probability of this last occurrence approaches the probability that 
gi(MW,) /(D — C),---, (MWis:) /(D — C)) S t. Since M approaches 
infinity with probability one as n increases, and from the restrictions on 
g(u,,***, Ue) given in Sec. 1, we can use the limiting distribution of 
MW;,--- , MWi-14; to compute the limiting 


Pigi((MW,) /(D — C), --- , (MWi-rs:) / (D — ©)) & th, 
and we get U(t) as this limiting probability. 


_M fs feet fea) 
M , 


n 


and from the considerations above, it is easily seen that E{L(n, t)/n} ap- 
proaches (D — C)U(t) as n increases. 


Next we show that 


fig 


approaches zero as n increases, which will complete the proof of Lemma 2. The 
expectation in question is equal to 


(1/n)E{ D¥~ (Z; — EZ,)} 
+ (1 / m)E{ D0 Diigs (Z: — EZ,)(Z; — EZ;)}. 


{Z;} are uniformly bounded variables, and M — k < n, therefore the first term in 
this last expression certainly approaches zero as n increases. As for 
> diniZs — EZ;)(Z; — EZ;), any such term with |i — j| > k has Z, and Z; 
defined in terms of entirely different and nonoverlapping sets of W’s, and by the 
result on the independence of Q’s derived above, if |i — j| > k, E(Z; — EZ;)- 
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(Z; — EZ,;) must approach zero as n increases. But the number of terms 
E(Z; — EZ,)(Z; — EZ;) with |i — j| S k is less than 2kn. From these con- 
siderations, it follows easily that 


[et mn 


approaches zero as n increases. 

Now we turn to the proof of the stochastic convergence of V(n). For sim- 
plicity, we assume that both f(z) and p(v) are continuous, for the time being. 
Given any positive «, we can find H intervals J, = (—«, a), Iz = (a, 2), 
I; = (22, 2:),°°:, Iw = (Ze-1, ©), such that the variation of f(z) and of 
p(|F(x)] in each of these intervals is less than ¢. Denote infz jaz, {f(z)} by g,, 
supzinz, {f(z)} by Gi, infsinr, {elF(x)]} by Ai, supsins, {ol (x)]} by H:. Let 
M; denote the number of variables X,, X:,---, X, that fall in J;. Define 
L,(n, t) in terms of the M; variables falling in J; just as L(n, t) was defined in 
terms of the variables falling in [C, D] in Lemma 2. Define Li(n, t) in the same 
way, except that each variable X; is replaced by F(X;,). Since F(X;) has a uniform 
distribution, Lemma 2 states that [Li(n, t)]/n converges stochastically to 
|F(z;) — F(z:1)]U(t) as n increases, where z denotes — ©, zg denotes ~. Also, 
since F(Yi4:) — F(Y,) = f(@)[Yior — Ya, Yi S 0 S Yous, and from the as- 
sumptions about g(u,---, mu), we have Li(n, git) S Ldn, t) S Li(n, Git). 
(M, + --- + M,)/n converges to F(z;) with probability one as n increases, 
therefore the probability approaches one that 


< R(t) s ———-—+ 
n n 


- t . t 
Eii(m a) on Ele(m a) on 
2 ; 


n 


This implies that the probability approaches one that 


a r a r 

Dre) — Fevlu (%) = RO s Dlred - Fev (FE). 
But by taking « small enough (i.e., increasing H properly) the two extremes of 
this last inequality approach S(t), proving the stochastic convergence of V(n). 

In the case where p(v) and/or f(x) have discontinuities, we can enclose the 

points of discontinuity in intervals whose total probability is arbitrarily small, 
change p[F(x)] and f(z) within these intervals to remove the discontinuities, and 
use the results above. The theorem would follow from a realization that the 
probability structure would be changed very little. The same device could be 
used to extend the results to cases where f(x) is unbounded. 


3. Application of results to tests of fit. First we prove the following lemma: 
If F(x) and G(z) are continuous distribution functions with density functions f(x) 
and g(x) respectively, then f[F~*(x)] = cg[G*(x)] for some c > 0 if and only if 
F(x) = G(cx + 6) for some constant b. To prove this, we note that the fact 
that F[F‘(x)] = zx gives by differentiation that f[F-'(z)]-(d/dz)F (x) = 1, 
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and g[@"(z)|(d/dz)G (xz) =1. Thus, if f{[F-'(x)] = cg[@'(z)], then 
c (d / dx) F\(x) = (d / dz) G(x), so cF~*(x) = G(x) + B, for some constant 
B. Letting x = F(y), we get cy = G"[F(y)] + B, or cy — B = G"[F(y)], or 
G(cy — B) = F(y). Conversely, if G(cx + b) = F(x), then cg(cr + b) = f(z), 
while cr + b = G@"[F(z)], so that cg(@"[F(x)]) = f(x), or setting y = F(z), 
cg(G*(y)) = fiF“(y)], completing the proof of the lemma. 

Now we examine the theorem of Sec. 2 for the special case k = 1, g(u) = u 
(therefore r = 1), and p(v) = (1/)f|F'(v)], where @ is a positive constant. 
Then U(t) = 1 — e', and S(t) = f*. (1 — e*'If(x) de = 1 — e™, and thus 
does not depend on f(x). Suppose we are confronted with the following problem 
in hypothesis testing: X,, X:,---, X, are known to be independent and 
identically distributed chance variables, with a continuous distribution, and the 
hypothesis is that the common distribution function is F(cx + 6) for some un- 
known constants c, b(c > 0), where the form of F(x) is known. Here c is a scale 
parameter, and b is a location parameter. We are going to examine the properties 
of the test which rejects the hypothesis when inf» sup; 20 |R,(t) — (1 — e*')| 
is “too large.” We are going to show that this last expression converges sto- 
chastically to zero if and only if the hypothesis is true, so that the test is con- 
sistent. Also, when the hypothesis is true, the distribution of the expression is 
independent of the parameters c, b. 

When the hypothesis is true, there is some a > 0 such that 

suprzo |Ra(t) — (1 — e~*’)| converges stochastically to 
zero as n increases. This follows from the lemma at the beginning of this section, 
and implies that when the hypothesis is true, infg>s sup:20 |R.(t) — (1 — e*')| 
converges stochastically to zero as n increases. If the hypothesis is not true, then 
the true common distribution is H(z), say, with density A(x). Then, defining 


S(t) as 
rae —Bth(x) \ 
l [2 7 amo | pun ae, 


sup:z0 |R,(t) — S(é)| converges stochastically to zero as n increases. But S(t) 
will equal 1 — e~“‘ for some positive a if and only if the hypothesis is true, and 
therefore when the hypothesis is not true, infg>o sup:>0|Ra(t) — (1 — e**)| 
will not converge stochastically to zero. The fact that the distribution of 
infg>o SUP:z0 |Rx(t) — (1 — e **)| is independent of the parameters c, b when the 
hypothesis is true follows immediately from the fact that if A, B are constants 
(A > 0), and &,(¢) is the expression defined in terms of AX; + B, AX: + B, 
---, AX, + B in exactly the same way as R,(t) was defined in terms of 
Xi, X2, +++, Xn, then infgso suprao |R,(t) — (1 — e®')| is equal to 


infg>o suprzo |Rx(t) — (1 — e*)|. 
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THE ASYMPTOTIC POWER OF CERTAIN TESTS OF FIT BASED ON 
SAMPLE SPACINGS’ 


By Lionet WEIss 
Cornell University 


1. Introduction and summary. Suppose X,, X:,---, X, are independent 
and identically distributed chance variables, each with density f(z), where 
Sof(z) dx = 1, f(x) has a finite number of discontinuities, and there are two 
constants A, B(0Q < A < B < «) such that A S f(z) S B for all z in (0, 1). 

Let Yo denote zero, Y,4; denote unity, and let Y; S Y; S --- S Y, be the 
ordered values of X, , X:, --- ,X,. Define T;as Y; — Yi.fori = 1, --- ,n+1. 
Let r be any positive number greater than unity, and let V(n) denote )-74) 77. 
The following theorem was proved in [1]. 

TueoreM A. Jf f(x) = 1 for x in [0, 1], then the distribution of 


n*V(n) — WnT(r + 1) 
Vir +1) — (+ IIT + DE 


approaches the standard normal distribution as n increases. In the present paper, 
we prove the following generalization of Theorem A: 
THEOREM 1: The distribution of 


n*V(n) — Sn (r + » | f(a) dz 


A/ W(2r + 1) — Pr + 1) I f(z) dz — G — 1)T(r + 1) [ f"@) as | 


approaches the standard normal distribution as n increases. 


Theorem 1 can be used to compute the asymptotic power of certain tests of 
fit based on V(n). 


2. Proof of Theorem 1 when f(z) is a step function. First we prove Theorem 1 
for the case when there are H subintervals J,,---, Jz, I, = {0, a), 
I, = [a, ),°-:, Im = [za-1, 1), so that on J;, f(x) = a;, where 0 < 
A s a; S B. Let N, denote the number of the values X,, --- , X, which fall 
in the interval 7;, and let ;,Y¥; S :Y¥2 S --- S iYw, be the ordered values of 
these values in J; . Denote 2; by ;Yo, and z by «Yw,4:. 2 is to denote zero, 
za denotes unity. Define ;7; as ;Y¥; — «Yj.1,forj = 1, ---,N;+ 1. Define V; 
as )j<}" ,T5 . From Theorem A quoted above and from an examination of the 
conditional distribution of ;Yi1,--- , «Yw, given N;, it follows that the condi- 
tional distribution given N, of 


r—V2y7 
ae — VNI(r + 1) 


@ = Vier+i)—- + ire +1) 


Received July 23, 1956; revised July 26, 1956. 
1 Research under contract with the Office of Naval Research. It may be reproduced in 
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approaches the standard normal distribution as N; increases. Also, the condi- 
tional distribution of ;Y:,--- , ‘Yw, given Ni, --- , Nw depends only on N,, 
while the joint distribution of N,,---, Ng is multinomial with parameters 
n, a;(2:1 — 2), °** , @n(2e — Zn-1). From these facts, it follows that the joint 
distribution of 


\ 


N, Nu-1 { 
{vai (% - ay(21 = «)) kann vn( n = Qg-1(2n-1 a zn-+)) ? Q1, ace Que 


approaches the joint distribution of {S,,--- ,Sa1,71,-°-- , Ta} asn increases, 
where this last set of chance variables has a joint normal distribution with zero 
means and covariance matrix || a,; || (¢, 7 = 1,--- , 2H — 1), where a; = 1 if 
t > H,a,; = Oifiand/orjis = AH, ay = az; — 24)[1 — az; — 2-4) if i < H, 
and a;; = —aja;(z; — z;-4)(z; — 2;-1) if i, 7 are both < H andi # j. 

Now V(n) is equal to 


A A—1 H A—1 
(2.1) DV VTi — DT + DUT + on MW 

t= 1 
It can be verified easily from (2.1) and an examination of the distribution of 
iT; that n™'"{Vi(n) — at V,] converges stochastically to zero as n increases. 
Therefore if 


i | v.| — YnI(r + 1) [r-@ dz 
QQ) ae a ES 


(T(2r +1) — 2rP*(r + 1)] [r@ dz — | — 1)I(r+1) [rw as | 


has a limiting standard normal distribution as n increases, Theorem 1 is proved 
when f(z) is a step function. Let us denote »/n[(N; / n) — az; — 2;:-1)] by W., 
and note that W, + --- + Wy is identically equal to zero. The numerator of 
(2.2) can be written as 


H 


Vi@ the F prety DY ( ey Q, 
(2.3) , co | 
+ VYnI(r +1) es tes — Be 1) Y a,(z;— zi-1)) ~ | = + a,(z; — 2.) | | 


Per: r 


and remembering that N;/n converges to a,(z; — 2:1) with probability one as n 
increases, (2.3) has the same limiting distribution as 

Vi@r +l) — fF Dr +) 7 & = #-)"9, 

(2.4) 
a Ww 
—(@- IG + YX 


, 


a; 
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But from the discussion above, it is easily verified that the distribution of (2.4) 
approaches a normal distribution with mean zero and variance equal to the 
square of the denominator of (2.2). This proves Theorem 1 when f(z) is a step 
function. 


3. Proof of Theorem 1 in the general case. The proof in the general case seems 
to require a great number of details, which we merely outline. In the first place, 
we may assume that f(z) is continuous on (0, 1), for if it has a finite number of 
discontinuities, we may handle each subinterval on which it is continuous sepa- 
rately, and then put them together as in Sec. 2. Then, defining A; as 
| F(Y,) — t/n|, and remembering that f(x) = A > O, we find that 
| ¥; — F'(i/n) | S \¢/A. We have F(Yi41) — F(Y:) = f(@)[Yia1 — Yij, where 
Y; < 6; < Your, or F(i/n) — (Ai/A) < 0 < FCC + 1) /n) + (eus/A). Then 
we may write 


F(Yin) — F(Y) =f |r (5)| [Yiu — Yd + vs [Yeu — Yi, 


where y; = f(0;) — f{F'(i/n)]. Due to the uniform continuity of f(z), and the 
fact that max; n°” “A; converges stochastically to zero as n increases, we shall 
be able to ignore the term yi{Yis: — Y,] in certain respects. We denote 
F(¥i41) — F(Ys) by Usa, and Yi4i — Y; by Ti, . Then we may write 


Tus Ui VT ins 


(3.1) 


We are going to examine the moments of the chance variable W = >> n'77j, 
and it is clear from an examination of (3.1) that the leading terms of these mo- 
ments will be the corresponding moments of, say, 


nu; F 
“Tf. f/f —~1VI> =Q. 
n 
Let Vi, --- , Vas: be independent chance variables, each with density e” for 
v > 0. Then E{ VGViG? --- Vik} = Tia + 1)T(q + 1) --- (a + 1). Also, it 
is well known that 
. o:fory yey — OT + DTG: + 1) --- Te + 1) 

: 1 J. Peace ‘) — a 
E {(nU;,) (nU;,) (nU a) } T'(n + a, + .$% + a, + 1) ; 
and this last expression approaches I'(a; + 1) --- I'(a_ + 1) as n increases. That 
is, with respect to their moments, the chance variables nU,, --- , nUn4 act 
like the independent chance variables V,,---, Vay. 

Defining the chance variable Q’ as 


Vi 


ry te) 
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it is known that E{[(Q’ — EQ’) / oq}*} approaches 4, , the kth moment of a 
standard normal chance variable, for any positive integral k. From the discussion 
above, one might expect the same to hold for E{{Q — EQ) / ae]"}, and a de- 
tailed examination shows that this is so. It is also so for E{[((W — EW) / ow}'}, 
since the terms in this not given by the corresponding terms with W replaced 
by Q approach zero in the limit, due to the properties of y, defined above. This 
completes the proof. 


4. The asymptotic power of certain tests of fit. To test the hypothesis that 
f(x) = 1forO S x & 1, the test that rejects when V(n) = C,(a) has been sug- 
gested, where C,(a) is a constant depending on the sample size n and on the 
desired level of significance a. Denote (1/-/ 2x) fee dt by $(v), and let k(a) 
denote the value such that ¢(k(a)) = a. Then Theorem A shows that for large n, 
C,(a) is approximately equal to 


n/n (r + 1) + Ka)VTQr + 1) — FF HMC F DI 
while if the true common density is f(z), then the large-sample power of the test 
is approximately equal to 
1 
nC (a) — VSn0(r + 1) I f(x) de 

0 
EE —_—— = a — —<———— — = = i — 2 b 
y/ Weer +1) —2rr(r + if #*@) dx — G — 1)T(r +1) | f'* (x) dz| 

/0 d 


0 
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THE DISTRIBUTION OF THE NUMBER OF LOCALLY MAXIMAL 
ELEMENTS IN A RANDOM SAMPLE 


By T. Austin, R. Fagen, T. LEHRER, AND W. PENNEY 


Washington, D. C. 


0. Summary. The distribution of the number of different locally maximal ele- 
ments in a random sample is found, where the sampling is from a continuous 
population of real numbers. This distribution has application in certain non- 
parametric tests; the problem of finding the distribution may be regarded as 
identical with the enumeration of permutations according to the number of dis- 
tinct locally maximal elements. 


1. Introduction. An ordered sample of n real numbers is drawn at random 
from a population having a continuous distribution. For a given integer k, an 
element of the sample is called locally maximal if it is the largest of some k con- 
secutive elements of the sample. The distribution of the number of different 


Received September 4, 1956. 
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locally maximal elements in a random sample is discussed in the following; 
this distribution can be used as a basis for certain nonparametric tests in a 
manner analogous to other order statistics. 

Although the problem arose in just such a context, it can, as will be indicated 
below, be treated as a purely combinatorial problem. The problem is then to 
enumerate the permutations on n objects according to the number of different 
locally maximal elements and so belongs to a class of problems similar to those 
studied by Riordan [1], Sprague [2], Sade [3], e.g., classifications of permutations 
according to various characteristics, such as rising sequences, falling sequences, 
readings [cf. Riordan [1]], etc. 


2. Locally maximal elements. Let 2 be a population of real numbers with 
continuous distribution function, and let 0, = (2, 22, --+ , 2.) be an ordered 
sample of size n drawn at random from Q. For a given k let 

Ys = MAax(Ti4s , Zine, *°* , Minn) t = O,1,2,---,n — k. 


Let 


" if y; = 2; for at least one i, 
z= 


La *** » Mt 
\0 otherwise 


If z; = 1, then z; will be called a k-marimal element, or for brevity, a maximal 
element. Note that the probability of a tie, i.e., the event z,; = z, for 1 ¥ m, is 
zero, and therefore there is no essential ambiguity in the definition of z; . Clearly 
z; is itself a random variable, being a function of a random sample. Thus the 
sequence 2; , 22, °** , Zn is a sequence of random variables (which are neither 
independent nor identically distributed) associated with the sample 0, . Now 
let S, = >}, z;. The problem is to find the distribution of the random variable 
S,, i.e., the set of numbers {p,}, 


(1) p. = P[S, = 8], e= 0,1,2,--- ,n. 


It is easily seen that the distribution of S, is independent of the underlying 
distribution of 2, and depends only on the order relationships among the numbers 
%,°**,2%,. It is convenient, therefore, to replace these numbers by the proper 
permutation of the integers 1, 2, --- , n, i.e., that permutation having the same 
order relationships as z;, 22,-:‘-, 22. By symmetry, all permutations of 
1, 2, ---, have an equal probability of occurrence, so the distribution of S, 
may be obtained by finding the number f,(n, 7) of permutations of the first n 
integers which have exactly 7 different maximal elements. 


3. Recurrence relationship and generating function for the numbers /,(n, 7). 
A recurrence relation for the numbers {f,(n, 7)} can be found in the following 
manner: 

Consider all permutations of the first n + 1 integers in which the largest ele- 
ment is in the (m + 1)st position, i.e., those permutations of the form 
Q, @2,°**, m,n + 1, Gmg2,-** » Gung. . Certain of these have exactly i + 1 
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different maximal elements. To enumerate such permutations, note that the 
element (n + 1) is necessarily maximal so that the permutations (a; , dz, --* , Gm) 
and (Gm42,@m4s,°** » @n41) must between them contribute 7 different maximal 
elements. The m integers a; , --- , dm Which appear to the left of (n + 1) can be 


. oe 7 ; 
selected in = ways; for any of these choices there are f,(m, r) permutations 


of a, a@2,°**, @» Which have r maximal elements. Similarly, there are 
fi(n — m, i — r) permutations of the remaining integers Gm42, Om43,°** » Oni 
which have i — r different maximal elements. Thus the total number of permuta- 
tions of the first n + 1 integers which have the largest element in the (m + 1)st 
position and which have 7 + 1 distinct maximal elements is 


¥ fal, r)fi(n — m,i — r) Ee 


Summing on m, the total number, f,(nm + 1, 7 + 1), of permutations of the first 
n + 1 integers with 7 + 1 different maximal numbers is given by 


n 


@ film +1641) = LL hlmr fir — mi-n)("). 


with the following boundary conventions: 
fi(n, 0) = n}; n<k, 
(3) fk(n,0)=0; n2k, 
fi(n, i) = 0; t>O," < &. 


Note that, with these conventions, (2) holds for all n whenever i > 0 and for 
values of n = k — 1 whenz = 0; (3) must be used to determine f,(n, 7) for other 
values of n. 

Using (2) and (3) the numbers f;(n, 7) can be calculated recursively, and thus 
the desired distribution in (1) can be found for any fixed n since 


(4) p, = PIS, = 8] ~ oe. 


Another way of generating the distribution in (4) arises from considering 
the generating function »,(z, y) of the numbers f,(n, 7) / n!. Let 


(5) vu (2, y) = 7 rer ay’. 


a=0 §=0 


From (5), it is obvious that 


6) Ta) 2 > film, By’. 
p—0 


OX” \reo 
Equation (6) thus gives the generating function for the numbers f;(n, 7) for 


fixed n, and hence the generating function for the entire distribution in (4) 
is just 
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1 O” uy, 
n! Ox” \emo 


Furthermore, it can be shown from (2), (3), and (5) that »(z, y) satisfies the 
differential equation 


(7) = = yo + (1 — y)(1 + 22 + 32? + --- + (k — 1)2*”). 


Also, note that from (2), (0, y) = 1, and so 
av | 


Ox |r—0 


= 1, 


The generating function in (6) can therefore be found by repeated differentiation 
of (7). As a check, it can be shown by induction, using (3) and (6) for y = 1, 
that >-$-0 fi(n, 8) = n!, as is of course necessary. Similarly one may show that 
for n = k the mean of S, is given by 


5 Bln, 8) _ 1 E (z=) 2n —k+1 
(; " = ees = .. outa 9 = . 
E\S,) 2 n! n! Lay \ax" / (x3 k+1 
These relations may also be derived, though less easily, by induction from (2) 
and (3). 


4. Numerical examples. As an example, the first few values of (0"v, / dz”) | .-0 
for k = 3, as found from (7), are given below: 


( a0} 
OX |z—m0 
av 

Ox* Lo 
av 

dv | 
x4 
av 
x5 
av 
| xé 
d’v 
Ox! |em0 
a’v | 
O28 leno 


1, 


2, 


12y + 12y’, 


24y + 72y* + 24y’, 





i 
| 


408y? + 264y* + 48y*, 


1008y" + 3120y* + 816y' + 96y’, 





= 2016y* + 18624y*° + 17376y* + 2112y° + 192y°. 


The coefficients f,(n, 8) of y in Eq. (8) can also be computed directly from (2). 
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PERCOLATION PROCESSES: LOWER BOUNDS FOR THE 
CRITICAL PROBABILITY 


By J. M. HAMMERSLEY 


United Kingdom Atomic Energy Research Establishment 


1. Introduction. A percolation process is the spread of a fluid through a 
medium under the influence of a random mechanism associated with the me- 
dium. This contrasts with a diffusion process, where the random mechanism 
is associated with the fluid. Broadbent and Hammersley [1] gave examples 
illustrating the distinction. 

Here we shall consider a medium consisting of an infinite set of atoms and 
bonds. A bond is a path between two atoms: it may be undirected (in which 
case it will allow passage from either atom to the other) or it may be directed 
(in which case it will allow passage from one atom to the other but not vice 
versa). Two atoms may be linked by several bonds, some directed and some 
undirected. Broadbent and Hammersley [1] dealt with crystals, i.e., media in 
which the atoms and bonds satisfied three postulates denoted by P1, P2, and 
P3. Here, however we shall dispense with P1 and a part of P3; and our sur- 
viving assumptions are: 

P2. The number of bonds from (but not necessarily to) any atom is finite. 

P3(a). Any finite subset of atoms contains an atom from which a bond leads 

to some atom not in the subset. 

With this medium we associate the following random mechanism: each bond 
has an independent probability p of being undammed and q = 1 — p of being 
dammed. Fluid, supplied to the medium at a set of source atoms, spreads along 
undammed bonds only (and in the permitted direction only for undammed 
directed bonds) and thereby wets the atoms it reaches. Associated with each 
atom A, there is a critical probability pa(A), defined as the supremum of all 
values of p such that, when A is the only source atom, A wets only finitely 
many atoms with probability one. We seek lower bounds for pa . 

An n-stepped walk is an ordered connected path along n bonds, each step 
being in a permitted direction along its bond and starting from the atom reached 
by the previous step. Walks (as opposed to fluid) may traverse dammed bonds: 
a walk is dammed or undammed according as it traverses at least one or no 
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dammed bond. A walk is self-avoiding if it visits no atom more than once. The 
number of n-stepped self-avoiding walks starting from the atom A is denoted 
by fa(n). The connective bound of the medium is defined to be 


(1) \ = sup limsup n™ log f,(n). 
A n~o 


Hammersley [3] showed that, under the fuller assumptions required by a crys- 
tal, there existed a connective constant 
(2) « = lim n™ log f,(n), 


n~o 


independent of A. Clearly \ = «x when the latter exists. 

The principal n-neighbourhood of an atom A, written N"(A), is the atom A 
together with all atoms accessible from A by walks of n or fewer steps. The 
principal n-boundary of A is B*(A) = N"(A) — N*"“(A), where N°(A) is A 
alone. A walk belongs to a set of atoms S if every step of the walk sfarts from 
some atom of S: notice that the final step may terminate at some atom not in S. 


2. Statement of results. Let E,(A, p) denote the expected number of atoms 
of B"(A) which can be reached from A by at least one undammed walk be- 
longing to N* (A). Define F, = F.(p) = sup, E,(A, p). 

THEOREM 1. F,(pa + 0) 2 1. 

It is an easy matter to show from P2 and P3(a) that N"(A) has only finitely 
many atoms, and that every atom of B"(A) can be reached from A by at least 
one (perhaps dammed) walk belonging to N”’(A), and that B"(A) contains at 
least one atom. Therefore E,(A, p) is an increasing function of p, and F,(p) is 
a nondecreasing function of p, and 0 = E,(A, 0) = F,(0) and1 S £,(A,1) Ss 
F,(1). Thus Theorem 1 provides a lower bound for pz, because pag exceeds any 
solution of F,(p) < 1. 

Let P, = P,(A) be the probability that the single source atom A wets at 
least one atom of B"(A). 

TuroreM 2. If F, < 1 for some particular n, then Py < Fi’ for all N, 
where [N/n] denotes the integer part of N/n. 

Theorem 2 is a rather more precise form of Theorem 1 and will be required 
elsewhere [5]. Here we shall deduce Theorem 1 from Theorem 2. 

THEOREM 3. ps = € >. 

Theorem 3 is a straightforward generalization of a previous result (({1], Theo- 
rem 7). 

I have not yet succeeded in proving or disproving 

ConsectuRE 1. For each fixed p, F, is a subexponential function of n, that is 
to say, Fism S FiF'm. 

Example 1 below shows that Theorem 1 is sometimes stronger than Theorem 
3. Theoretically, Theorem 1 is never weaker than Theorem 3. However, when 
the medium is a crystal it is not hard to estimate «x by Monte Carlo methods, 
and Theorem 3 may prove more useful than Theorem 1 in cases in which F, is 
hard to calculate for large n. Similarly, it may occasionally happen that Theorem 





792 J. M. HAMMERSLEY | 
1, although weakened theoretically thereby, may yet be strengthened prac- 
tically by redefining E,, as the expected number of wet atoms in B"(A). 

EXAMPLE 1. Let the atoms of the medium lie in the Euclidean plane at the 
points (xz, y), where z, y = 0, 1, 2, --- . Suppose that from each (z, y) there 
is a directed bond to (x + 1, y) and a directed bond to (xz, y + 1). Then f4(n) = 
2", \ = log 2, and Theorem 3 gives pz = 3. However, F:(p) = 4p’ — p*; so that 
Theorem 1 gives pp > V3 — +/} = 0.518---. F; gives a slightly sharper 
result, but is much more tedious to calculate. We may also notice that F;(p) = 
2p; so that F, < Fj, and inequality is certainly required in Conjecture 1. 

EXAMPLE 2. Consider the familiar branching process in which each individual 
has 0, 1, or 2 descendants with independent probabilities g’, 2pg, p’. This, like 
all branching processes, is a special form of percolation process (see [1] for 
further details of this question). We consider the atoms of the medium to be 
the actual or potential individuals of the branching process. Each atom has 
just two directed bonds from it, and the fluid is life transforming a potential 
into an actual individual. We have F, = (2p)" and pa = 4, agreeing with well- 
known results. Also \ = log 2. Thus Theorems 1 and 3 are equally strong and 
best possible. Also Conjecture 1 holds with equality. This and other branching 
processes suggest 

ProsieM 1. Under what conditions is lim infas« F.(pa) = 1 valid? 

We cannot in general use £ in the role of F nor omit the +0 in Theorem 1, 
though perhaps the exceptional cases are rare. Example 3 provides one such 
exception. 

EXAMPLE 3. Suppose that the atoms A;, Ae, --- are connected by 2) directed 
bonds from A; to Aj; . Then 


m+n—1 


E,(Amn,p) = I] @-¢) <1 


— 


for p < 1. However pa = 0, because E,(A,,, p) is also the probability that 
A,, wets infinitely many atoms and the infinite product converges for g < 1. 
As a matter of passing interest, E,(Ai, p) = (8:/2q'")"", where q plays the 
usual theta-function role of Jacobi’s nome [6], p. 473; see also [2], Sec. 21.7: 
indeed, 27 rather than 7 bonds from A; preserved the nomenclature. 


3. Proof of theorems. Let A; be a fixed atom and n a fixed positive integer. 
In studying the spread of fluid from the single source atom A; , we shall suppose 
that the spreading occurs in consecutive recursively defined stages. Immedi- 
ately before the jth stage takes place, we shall know two sets of atoms, denoted 
by W(j) and S(j) respectively. Here W(j) is the set of atoms already wet up to 
but excluding the jth stage, and S(j) is the set of atoms about to serve as sources 
in the jth stage. The process starts from W(1) = S(1) = A, . If S(j) is empty, 
then S(j + 1) is empty, and W(j + 1) = W()). If S(j) is not empty, let A be 
any atom of S(j) and proceed as follows. Define X(A) to be the set of all atoms 
in N"(A) — W(j), which can be reached from A by at least one undammed 
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self-avoiding walk belonging to [N" (A) — W(j)] + A. Define Y¥(A) as the 
intersection of X(A) and B"(A). Lastly define 


(3) SG+1)=>u¥(A); WG+ 1) = WG) + d.X(A), 


where >>, denotes summation over all atoms A belonging to S(j). We shall 
also require sets of atoms T(1), T(2), --- defined recursively by 


(4) T(1) = Ai; TG +1) = DaetB'(A), j= 1,2,---. 


Let A be an atom in S(j), supposed not empty, and let B be an 
atom of T(j + 1). We write A — B to denote the existence of at least one un- 
dammed self-avoiding walk from A to B belonging to [N” (A) — W(j)] + A. 
By the foregoing definitions, A — B implies that B ¢ S(j + 1); and conversely 
there exists an atom A;<« S(j) if and only if we can find atoms A;, Az, -:-, 
A;_, belonging to S(1), S(2), --- , SG — 1) such that A; > A,.— --- ~ Aju 
A;. Notice also that the sets S(1), S(2), --- are mutually disjoint; and that 
S(1), S(2), --- , SG) are all subsets of W(j). Finally S(j) is a subset of T(J). 
It may also be a subset of T(k) for k ¥ j; but this will not affect our argument. 

If A is an atom of the nonempty set S(j), we define its score (A) = >> s0(B), 
where >>», denotes summation over all atoms B ¢ S(j — 1) such that B — A. 
We begin this recursive definition from 6(A,;) = 1 when 7 = 1. To the set T(j) 
we attach the score 

> (A) if S(j) is not empty, 
(5) =o 
0 if S(j) is empty. 

Suppose that W(j — 1) is given, and that S(j — 1) is not empty. This means 
that S(1), S(2), --- , SG — 1) are all given and not empty. Hence 6(B) is given 
for each B e S(j — 1). Consider the conditional expectation of ¢; given W(j — 1) 
with nonempty S(j — 1). We have 


El¢;|WG-1), SG-1) #90 
6) = 2 6(B) Prob [B > A| W(j — 1] 


Ae T(i),B e S(j—1) 


> 0B) ; ae Prob [B + A| W(j — 1)]. 


BeS(j—1) 


Since B — A involves the existence of at least one undammed self-avoiding 
walk from B to A belonging to [N”"(B) — W(j — 1)] + B, this event depends 
only upon the condition of bonds whose condition does not affect W(j — 1). 
Hence, 


(7) Prob [B — A | W(j < 1)] S Prob [B ~ A], 
where B ~ A, in the unconditional probability on the right of (7), means that 


there is at least one undammed walk from B to A belonging to N””"(B). Then, 
by definition of Z,(B, p) and F,(p), we have 


(8) > Prob [B ~ A] = E,(B,p) < F,(p). 


A4eT(s) 
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Combination of (6), (7), and (8) yields 
(9) Ele;|WG-1), SG-1)¥OSFACp) Dd 0B). 


Be S(j—1) 


In (9), we can remove the condition S(j — 1) # 0, provided we interpret the 
right-hand empty sum as zero when S(j — 1) = 0. Hence, 


Ele) = VEl¢;| W(j — 1)] Prob [WG — 1)] 
S Falp) | 2. 2 0(B) Prob [W(j — 1)] = Fa(p) Fld. 


(10) 


Since every walk from A; to B”(A,) contains at least N steps, A; cannot wet 
any atom of B’(A;) unless none of S(1), S(2), --- , S(v + 1) are empty. where 
v = [N/v]. If S(v + 1) is not empty, ¢,4, 2 1. Thus, by (10), 


(11) Py & Prob [4:1 = 1) S Elo) s Fi. = FW’, 


which is Theorem 2. The relation (11) is true but useless if F, = 1. 
If 


(12) F, = F,(p) < 1, 
then 
(13) lim P, = 0, 


Nww 
by (11). Since N*(A,) contains only finitely many atoms, (13) implies that A, 
wets infinitely many atoms with probability zero. Therefore, by definition of 
Pa “gn pa(Ay), 


(14) PS pa. 


Since (14) is a consequence of (12), we deduce Theorem 1. 

Theorem 3 is easy; for A; does not wet B’(A;) unless there is at least one 
N-stepped self-avoiding walk from A,. The probability of this event is less 
than or equal to the expected number of such walks, namely p”f,(N), because 
all bonds of a self-avoiding walk are distinct. If p < e~, lima.e p*f4(N) = O by 
(1), and Theorem 3 follows. 

To see that Theorem 1 is always as strong as Theorem 3, notice that, given 
¢ > 0, there exists n such that f,(m) < e°*°” for m = n; and hence E,(A, p) S 
Ymen (pe’**)”. The right-hand side does not depend on A, so that we may 
write F,(p) for E,(A, p). Then, if p < ¢ °*, F, > 0 as n— @; and the re- 
sult follows because ¢ is arbitrary. 
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NON-PARAMETRIC UP-AND-DOWN EXPERIMENTATION! 


By Cyrus DERMAN 
Columbia University 


1. Introduction. Let Y (x) bea random variable such that P(Y(z) = 1) = F(z) 
and P(Y(z) = 0) = 1 — F(x) where F(z) is a distribution function. It is some- 
times of interest, as in sensitivity experiments, to estimate a given quantile of 
F(x) with observations distributed like Y(z) where the choice of z is under con- 
trol. A procedure for estimating the median was suggested by Dixon and Mood 
(2). The validity of their procedure depends on the assumption that F(z) is 
normal. Robbins and Monro [6] suggested a general scheme which can be used 
for estimating any quantile and which imposes no parametric assumptions on 
F(z). Their method does assume, however, that the range of possible experi- 
mental values of z is the real line. In practice, this will not be the case. Limita- 
tions on the precision of measuring instruments, or natural limitations such as 
when z is obtained by a counting procedure, will usually restrict the experimental 
range of z to a set of numbers of the form 


a+hn(-~ <a< w,h>O0,n =0, +1,--- ). 


In this note we suggest a non-parametric procedure for estimating any quantile 
of F(z) on the basis of quantal response data when, experimentally, z is restricted 
to the form a + An. 

For convenience we assume a = 0, h = 1. Suppose we wish to estimate that 
value of z = @ such that F(@ — 0) Sa S F(#),4 Sa<1Lif0<asitor 
a ~ 0 orh # 1 the necessary modifications will be apparent. The experimental 
procedure is as follows: choose 2; arbitrarily. Recursively, let 


Ze = Zn — 1, with probability =. if yan = 1, 


(1) Zn1 +1, with probability 1 — > ify. = 1, 


Zr + 1, with probability 1 if y,.1 = 0. 
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where y, denotes the zero-or-one response at 2; . The estimate 6, of @ based on 
n observations is the most frequent value of x, if unique, or the arithmetric average 
of the most frequent levels, if not unique. 

We shall prove the following 

Tueoreom. If F(x) is strictly increasing for @ — 1 S x S 6 + 1, then 


P(max( | lim sup,.,, 9. — 6], | lim inf,., 06, — @|) <1) = 1. 


2. Two lemmas. 
Let {X,}(n = 0, 1,---) be an irreducible Markov chain with recurrent 


non-null states and stationary transition probabilities {p,;} (see Feller [3] for 
definitions of terms) such that 


(2) Dizi + Disc = 1 (¢ = 0,+1,---). 


Let v; (¢ = 0, +1, --- ) be the unique solution of the equations 


we 


DL pis = 0; (j = 0, +1, ---), 


v; > 0, for all i, 
2 vy; = 1. 
Since {X,} is irreducible and the states are recurrent non-null, the system (3) 


has such a unique solution. The »,’s play the role of stationary absolute probabili- 
ties; i.e., if P(X» = i) = v;, then P(X, = 71) = »; for every n. 


Lemma 1. If for some i = b, Popir S Poor, Popit > Poiiyie and py i41 is non- 
increasing in i fori = b + 1, then vw > bey: and v; is non-increasing in i for 
26+ 1. Similarly, if for some i = C, Pet S Pexe+1» Pe,e-t > Po-i.-2, and 
Pi,i+1 18 non-decreasing in i fori S c — 1, thenv, > v--, and v; is non-decreasing 
mifori se — 1. 

Proof. Let x;; = P(X, = j for some n 2 1, X, ¥ torjforr < n| Xo = 1). 
From a result of Harris [5] we know that 


(4) Viet Wi ith 

V5 Wi+l,i 
It is clear however that mii41 = Piva. ANd mii, = Piss. Hence, from (4) 
and by the hypothesis 


Pb b+1 as 
Period L— Porrsie 1 Doo 
and thus %4; < v. The remainder of the proof follows in the same manner. 
Let N,,(z) denote the number of r such that X, = iforr S n. For the truth of 
the following lemma we need not impose the condition (2). 


Lemma 2. Let B be the set of states such that vx = max; {v,} for i’ e B. Then for 


every v’ € B. 
P (tim oa) =», > lim max {N= == 1. 


n>2o noo i¢B nm 
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Proof. Since >-7_.. v; = 1, there exists a finite set A of states with B C A 
such that }°4.4v; < vy. From the strong law of large numbers for Markov 
chains [1], it follows that P(lim,., (N,.(¢)/n = v,) = 1 for every 7 and more gen- 
erally P(lima.., Dies (Na(i)/n = Soyav,) = 1. Let € be any number such 
that 0 < « < vy — max (maxya_s {0;}, Posse v,) and let Ey denote the event 
that (N,(i’)/n > vv — € for alln > N. By the previous remark and since 
{Ey} is a monotone sequence, limy., P(Ey) = P(limy.,, Ey) = 1. Therefore 
there exists an N, such that P(N,(7’) /n > vy — eforalln > Ny) > 1 — «&/3. 
Similarly, since A is finite, there exists an N, such that P(maxy4—s {N,(i) /n} < 
vy — ¢ for alln > Nz) > 1 — ¢/3 and an N; such that P(>>44 N,(i) /n < 
vy — eforalln > N;) > 1 — ¢€/3. Let No = max (N,, N2, Nz). Then it follows 
that 


P(N, (1!) /n > vv — € 
> max (maxies—z {N,(7) / n}, > 0s N,(i) /n) for alln > No) > 1 — «, 


Since « > 0 is arbitrary, we have 


P(lim,.,, Na(t’) /n = ve > lim suppaso Maxis {N,(i) /n}) = 1. 


The last assertion implies that lim,.. max; {N,(i) /m} exists. By a similar 
argument applied to the finite set B, of states which have the second largest 
v,’s it follows that lim sup,.. maxis {N,(i) /n} can be replaced by lim,.. 
max,,s {N,(t)/n}. The lemma is proved. 


3. Application of lemmas. 
Let {X,} be the Markov chain defined by (1); i.e. let X, = iif z, = i. The 
transition probabilities are of the form 
F(i) 


Pin = 1- — 


2a ’ 
F(i) 


Pi = _” 

The chain is clearly irreducible and the states can be easily shown to be recurrent 
non-null using a theorem of Harris [5] or a modified version of a theorem of Fos- 
ter [4]. The numbers [6] + 1 and [6], where [@] denotes the largest integer less 
than or equal to @, can be taken as b and c of Lemma 1. From Lemma 1 and the 
condition of strict monotonicity of F(x) for @ — 1 S x S @ + 1, it is clear that 
[6] or [6] + 1 or both but no other states belong to B of Lemma 2. Thus, accord- 
ing to Lemma 2, the most frequent state, for n large enough, will be [é] + 1, 
[6] or both with probability 1. In any case, the difference between @ and [6] + 1 
or [6] or the arithmetic average of the two is less than 1. The theorem is there- 
fore proved. 
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APPROXIMATE MOMENTS FOR THE SERIAL 
CORRELATION COEFFICIENT 


By Joun S. Wurre’ 


Ball Brothers Co. 


1. Introduction and summary. The first order Gaussian auto-regressive proc- 
ess (x,) may be defined by the stochastic difference equation 
(1) Le = prt uu, 
where the w’s are NID(0, 1) and p is an unknown parameter. The choice of a 
statistic as an estimator for p depends on the initial conditions imposed on the 
difference equation (1). The so-called “circular”? model is obtained by consider- 


ing a sample of size N and then assuming that ry,, = 2,. An appropriate esti- 
mator for p in this case is the circular serial correlation coefficient 


N 
7 De Ti41 
(2) r= 


Tig 
Dz 


t=1 


(tw41 = a). 


Leipnik [1] has derived an approximate density function 


ies 
(3) fi) = r (M2) 2 (2) 
2 2 
for the estimator r. Leipnik also evaluated the first two moments of this dis- 


tribution. In this paper a formula is obtained which gives E(r") as a polynomial 
of degree k in p. 


2. The general formula for E(r"). To calculate the moments of r we must 
evaluate the integral 


(4) E(r*) = [ £90 at 


(1 en 2tp + p) 71 = gy e-nee 
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The direct integration of this function is not obvious; however, it can be evalu- 
ated quite easily by means of the Gegenbauer polynomials. 

Gegenbauer’s function C}(t) for integral values of j is defined to be the co- 
efficient of p’ in the expansion of (1 — 2tp + p”)~” in powers of p (for this and the 
following results concerning the Gegenbauer functions see Magnus and Ober- 
hettinger [2] pp. 77 and 78). 


(5) (1 — 2tp + p’) -y CF (t)p’. 


The Gegenbauer polynomials are orthogonal over the interval (—1, 1) with 
weight function (1 — t*)””® and have the general properties of the classical 
orthogonal polynomials. 

One special result which we shall apply is the following. Let g(t) be a con- 
tinuous function with 7 continuous derivatives; then 


: 2\n—-1/27yn y * ' 2\ j+n—1/2 d’g(t) 
© f goa -er“co a = Ko) [a — a 0 a, 
where 


T(2n + j)T(n + 4) 


MOD = Tear +35 + DIG + D2 


This result may be verified by applying the ‘(Rodrigues Formula” for C7 (t) 
(see [2], p. 78, line 2) to the left-hand side of (6) and then integrating by parts 
j times. 

Expanding the denominator of (4) in a series (5) we have 


r (eS + =) 
“ae 2) (N—1)/2 N/2 
(7) Br) =) Way hee é) bt (o'] ae 
Since, for this problem, | p | < 1 and|¢| S 1, (5) may be written as 


(8) (1 — 2p cos 0+ p’)” = (1 — pe”)-"(1 — pe”) 
(8a) = ¥ CF (cos 6)p% 


j=0 


Expanding the right-hand side of (8) in powers of h as the product of two bi- 
nomial series and comparing coefficients of h’ with those in (8a) we have 


(9) \C3 <eos @)| = (~7"). 


Hence, by the Weierstrass M-test, the series }> C7 (cos @)p’ = >> C3 (t)p’ con- 
verges uniformly in 1. 
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Since the series converges uniformly we may invert the order of integration 


and summation in (7). Applying (6) to (7) with g(t) = t andn = N/2, we have 


k p(t) 
E(@") = > K(N/2,j) —- 2 + 
(10) ie TCS 


1 
. / k(k —1)+--(k -j + ie — pyte-ne gy 
—1 
We note that 


1 
[ (1 —¢)*dt = for p odd, 
1 


r(24*1) m9 41) 


3 (es for p even. 
pt+l 
r r(Ptt +041) a +at 1) 
If we now let 24 = k — 7 (i = 0,1, --- , [k/2]), (10) becomes 
a ; c — Q)r(k ; ) kB 
1) A= ¥ nov k= gor + Dn + pa + ya 


= DON )I(k — 2i + 1)P(2i + DTT ("3 22 1, = i) a 


Applying the multiplication theorem for the gamma function 


2°T(p + 3)I(p + 1) = (2p + 1)P(1/2), 
we find 


ws TW +k 2dr + yr AL? po 
(12) Er’) = DU atte eae eg ——— | 
j=0 T(N)T(k — 22 + pr (73 ch he = ATi + 1) 


The above formula may be simplified by considering separately the cases 
k even and k odd. Setting 27 = k, (10) becomes 


i T(N + 27 — 21)r(2j + Dr (= : 2) aes 
(18) BM) = Fs 
= D(N)T(2j — 21 + pr (73 +95 i) are +0 


Setting p = j — i and applying the multiplication theorem again, (13) may be 
written as’ 


| gt(o+a) (» + US Neg + arg + or Xt)» 
COND SP ie eee eae a hecerenicigaigeneeesn onl, 
p= rr! At? rep + P0(p + yr (* FF srs) 


2 For p = 0 the expression in the braces { - n (15) is to be taken as 1. 
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or 
i 21) UN ane - 2p 
(15) BG) = Y —WUNW + DW + 2) --- W + 2p — Dio” _ 
p=o 2-?(2p)\(7 — p)'(N + 2)(N + 4) ++: (NW + 27 + 2p) 
The corresponding results for k odd, k = 27 + 1, are 


J 


E(r’**) - ¥: 


p=0 


(16) aie aoe ¢ ia rs es 


E(r’* ) 


(17) = (2j + 1)IN(N + 1)(N + 2)- ot + 2p)p??*" 
p—o 2-?(2p + 1)1(7 — p) MN + 2)(N + D »(N + 27 + 2p + 2)" 
From (15) and (17) we see that 
(18) Lim E(r*) = p*, for all k. 
N= 


3. Specific moments of r. Direct substitution in (15) and (17) yields the fol- 

lowing: 
“ N 
E(r) on =u, 

N(N + 1)p° 

(N + 2)(N + 4)’ 
‘ y T 3 

(19) El) = 3Np N(N + 1)(N + 2)p 


“WHDNTD 1 WH+DW TDN TO’ 


E(r') = 3 + 6N(N + 1)p° 
(N+ 2)(N +4) (NV + 2)(N + 4) + 6) 
N(N + 1)(N + 2)(N + 3)o° 
(N + 2)(N + 4)(N + 6)(N + 8)" 
The first two moments agree with those obtained by Leipnik, who evaluated 
them by another method 
The central moments of r are 


E(r’) = 


a 
1 
N+a7 


= 


eee 2. 5) Ae 
N+2 (N + 2)(N + 4)(W + 2) , 
—6Np 2N(N — 2)(3N — 2)p° ) an 
— WNW + 29 (F F4°WHDN4F+59N46) ™ 
3 | _ 2N(N? — 8N — 4)p’ 
2)(N + 4) (N + 2)*N + 6) 
N(N* — 16N* + 40N*® — 32N + ee ox 


r (V+ DN +O) +8) 
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For large values of N the variance, skewness and kurtosis of r are 


2 _l—p Ay—2 
7 =H t OW ); 


V8: = ws/o* = o(N"), 
Bo = w/o = 3 + O(N’). 


These last results are to be expected since it is well known that r has an asymp- 
totic normal distribution. 


(21) 


4. Final remarks. The above results should be adequate, as Leipnik has sug- 
gested, for serial correlation problems when N = 20. In particular the expres- 
sions for the moments of r will be of assistance in evaluating the moments of 
functions of r; for example, the variance stabilizing transformation z = sin” r 
which will be treated in a future paper. 


? 
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ON A DECISION PROCEDURE BASED ON THE TUKEY 
STATISTIC 


By K. V. RAMACHANDRAN! AND C. G. KuHatri 


University of Baroda 


1. Summary. In this paper a decision procedure based on the Tukey Stu- 
dentized range ((5], [6], [8]) has been shown to be an optimum procedure for a 
particular type of slippage of means of univariate normal populations based 
on a common but unknown variance. The method given here is similar to that 
used by Paulson [2] and Truax [7]. 


2. Introduction. Let 2;;(¢ = 1, 2,--- ,k;7 = 1, 2,--- , n) be the elements 


of k independent samples of size n from normal populations with means 4; and 
variance o (i = 1, 2,---, k). Let 


i; = Dat (x;;/n), & = Din Da (xij — 2:)'/k(n — 1), 


Zmax = Max (4, ’ Zo FI a Z,) and Znin = min (4; 5 Ze ae es zx). Let Do denote 
the decision that the k means are all equal, and let 


Dili ¥ J; 4,5 = 1, 2,--:, k) 
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denote the decision that Do is incorrect and uw; = pmin aNd uj = pmax . We will 
say that the pair (u; , w;) has slipped by an amount A(A > 0) if wm = we = --- = 
Mea = Mig = et = oe = i = ++ = oe = pw (say) and pw; = wp — A and 
uy = uw + A. The first formulation of the problem is the following: to find a 
statistical procedure for selecting one of the decisions (Dw, Di;)(i ¥ j; 1,7 = 
1, 2, --- , k) which will maximize the probability of making the correct deci- 
sion when some pair slipped subject to the following restriction (a) when all 
the means are equal, Do should be selected with probability 1 — a (where a 
is some small positive quantity fixed in advance of the experiment). 

Since the class of possible decision procedures seems to be too large to admit 
an optimum solution we will impose the following restrictions, (b) the decision 
procedure must be invariant under location and scale transformations of the 
variates (c) the decision procedure must be symmetric in the sense that the 
probability of making the correct decision when the pair (u;, u;) has slipped 
by an amount A must be the same for all 7,7 = 1, 2, --- ,k; 4 # 7. These addi- 
tional restrictions are rather weak and seem to be reasonable requirements to 
impose in many practical problems. We will now reformulate the problem as 
follows: we want a statistical procedure for selecting one of the decisions 


(Doo , Dis)(i ¥ 7; 4,9 ng 1, 2, ae , k) 


which, subject to conditions (a), (b) and (c), will maximize the probability of 
making the correct decision when one of the pairs has slipped. We shall prove 
that the optimum solution is the following: if 


n(z; - #;) 
[(nk — Dal” 


(1) Z: = Emin, 2; = Zunes, and > da; 


select D;; ; if 


n(#j— %) — 
[(mk — 1)s9]'? ~~ ~ 


select Do, where ga is a constant whose value is determined by restriction 
(a), and (nk — 1)s5 = >-'1 Do}-n (xi; — 2)”. This statistic has been suggested, 
on intuitive grounds, by Tukey [8]. Roy and Bose [6] have shown that the 
statistic can be derived by the union-intersection principle of test construction. 
Tables of the distribution of q for different values of a, n, and k are available 


({1}, (8), [4]. 


3. Derivation of the optimum procedure. Since (Z, , #2, --- , %, s*) con- 
stitute a set of sufficient statistics for the unknown parameters (uw, we, -*-, 
uz , 0) there is no loss in considering only procedures which depend on this set 
of statistics. Making use of this in connection with restriction (b) it is easy to 
see that any allowable decision procedure will depend only on the k — 1 statis- 
tics (4; —_ Z,)/s, (¢ = : 2, +O k— 1). Let w= (4; — %,)/s, (¢ = 1, 2, vee » 
k — 1) and let Ay = (us — we)/o, (¢ = 1, 2, --- , k — 1). The joint distribution 
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of the set (wi, We, -**, Wx-1) depends only on the parameters (A, Az, °*-, 
Ax-1). Let Do be the decision that \; = A». = ---= Axa = Oand let Dj; be the 
decision that A = Ae = eee = AiH1 = AG mere = Aja = Aja =e = Ne-1 
= 0, 4 = —A/o and dA; = A/o, (§ # j; tj fj = 1, 2, °°, k— 1), 
while Dy denotes the decision that 4, = A» = «+ = Aa = Aw = 

- =u = — A/o, \y = —2A/e, (i= 1, 2,---, K—1) and D,,; denotes the 
decision that A = Ae =s eee = Ay-1 => Agua =e Ne-1 = A/e, Ni = 2A/c 
(¢ = 1, 2,---, k — 1). Since any allowable decision procedure for selecting 
one of the set (Do , Di;) (¢ ¥ j; 7,7 = 1, 2, --- , k) must be a function only of 
(w, , We, *** , We-1), it can be transformed into a procedure for selecting one of 
the set (Dw, D;;), (i ¥ j, i, j = 1, 2,--+, k) by making D,; correspond to 
Di; , (i,j = 0,1, 2, --- ,k; i # jif i,j > 0), ie. whenever the original decision 
procedure selects D,;;, the transformed decision procedure is to select D,; . 
Because of restriction (a), the probability that any transformed allowable de- 
cision procedure will select Do when \; = Ax = +++ = Aes = O will be equal to 
1 — a, in addition the probability that any allowable decision procedure will 
select D;; when the pair (xu; , 4;) has slipped is equal to the probability that the 
transformed procedure select D;; when D,; is the correct decision, and this last 
probability must be the same for each (i, 7) because of restriction (c). 

The proof that (1) is the optimum solution consists mainly in showing that 
for any A and o there exist a set of nonzero a priori probabilities go , gi; , (¢ # 7; 
i,j = 1, 2, ---, k) which are functions of A and ¢ so that when (1) is trans- 
formed in the manner indicated above into a decision procedure for selecting 
one of (Dw, Di;), (¢ ¥ j; i,j = 1, 2, --- , k), it will maximize the probability 
of making the correct decision among the set (Dw, Di;), (¢ ¥ j; i, j = 1, 2, 

, k) when g;; is the probability that D,; is the correct decision (i, 7 = 0, 1, 
2,---,k,i ¥jifi,7 > 0). 

Assuming this has been demonstrated, it follows easily that (1) must be the 
optimum solution. For suppose there existed an allowable decision procedure 
D*, which for some A and o had a greater probability than (1) of making the 
correct decision when some pair had slipped. Then D*, which must only be a 
function of (w; , we, --* , We-1) when transformed in the indicated manner into 
a decision procedure for selecting one of (Dw, D,;), (i ¥ j, i,j = 1, 2, ---, k) 
will have greater probability than (1) of making the correct decision among 
(Do , Di;), ¢ ¥ 7; 1,7 = 1, 2, --+ , k) with respect to any set of nonzero a priori 
probabilities, which would be a contradiction. 

To show that the required a priori distribution exists, first let u; = (4; — Z)/c, 
(i = 1, 2,---, k — 1) so that w; = u,o/s. 

The joint probability density function f(w:, we, --- , Wea) Of wi, We, -- 
Wz-1 is easily found to be given by 


f(wi, we, +++, Wea) = c| ‘iat exp 
(2) 


- 3 P + AX (wit — ro? + BY (wit — ndlwst -r 4) | at 


+4) 
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where n’ = k(n — 1), A = n(k — 1/k), B = —n/k, and C is a constant whose 
precise value is not needed. 

Let fi; = fiw, a wz-1 | Dij), (i, j ™ 0, 1,2,°°°, k;i #j if i,j > 0) 
be the joint probability density function of w,, w:,--- , We-1 when D,; is the 
correct decision. The decision procedure which will maximize the probability 
of making the correct decision among the set (Dw , Dj;), (¢ ¥ j;i,7 = 1,2,---, 
k) when the a priori probability distribution is (po, fiz, --* , Pe), ie. the 
Bayes solution with respect to (poo, Pi2,°** , Pe), is known [9] to be given 
by the rule: for each i, j (i, 7 = 0, 1, 2,--- , k; i j if i, 7 > 0) select D,; for 
all points in the w space where pi,fi; = max (poofoo, Pisfi2,**- , De—1ufe—re)- 
For the problem at hand, this is the unique Bayes solution except possibly for a 
set of measure zero according to all f;, . Using (2) it is easy to calculate for each 
i, j the region where D,; is selected for the special a priori distribution po = 
1 — k(k — 1)p, Pris = +++ = Deu = DP. 

It can be easily checked that the Bayes solution is the following procedure: 
Select D. (i,j = 1,2,---,k -lLix¥)jif 


(w; — wy) > (wy — wy) 


-(i’,j’ = 1,2,---,k-—1, #j’ and i#i’,j #7’ simultaneously) 


;— wi)(A — B) 
(w; — wi) > | wy | and pee Claes > g: 


k—1 kun 
vn +ADwi+B > ww, 
rel 7 tem] 


(3) 


Select Du (¢ = 1,2,---,k — lif 
—WwW; > (w,- = w,)(i’, 7” = Z 2, “s k _— 1, i’  j’) 
—w,(A —_ B) 
(4) and k1 = > Q 
V n'+A>w+B > ww, 
r= rT fem] 
Select Di; (¢ = 1,2, ---,k — 1) if 
wi > (wy — wy)’, 7’ = 1,2,---,k — 1,7 #7) 
(5) w(A — B) 


and = = 
V n'+A>w+B > wu, 
r= r eml 
and select Dw otherwise. 
Define the function F(p) by the equation 


o 2 
F(p) = I ae exp (=") 
0 2 


{p exp (=3*) exp (228) —(1l — k(k - Del} dy, 


where qa is the constant used in (1). 


(6) 
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It is obvious that F(p) is a continuous function of p with F(0) < 0 and 
F(1/k(k — 1)) > 0. Hence there exists a p* with 0 < p* < 1/k(k — 1) which 
is a function of A/c so that F(p*) = 0. Once the Bayes solution relative to 
[1 — k(k — 1)p, p, --- , p] has been worked out, it is obvious that to get the 
Bayes solution relative to [1 — k(k — 1)p*, p*, --- , p*] it is only necessary to 
replace g by ga. If we now substitute w; = (@; — %)/s and replace A and B 
by their values, we find after some simplifications that the Baye’s solution rela- 
tive to (1 — k(k — 1)p*, p*, --- , p*] reduces to (1) when Dj; is made to cor- 
respond to D,;, (¢,7 = 0,1, 2,---,k;i #7 if i,7 > 0). Since (1) is an allow- 
able procedure, this proves that it is an optimum one. 
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ESTIMATES OF THE MEAN AND STANDARD DEVIATION OF 
A NORMAL POPULATION’ 


By W. J. Drxon 
University of California, Los Angeles 


0. Summary. Several simple estimates of the mean and standard deviation 
of a normal population are discussed. The efficiencies of these estimates are 
compared to the sample mean and sample standard deviation and to the best 
linear unbiased estimates. Little efficiency is lost when simple rather than opti- 
mum weights are used. 
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TABLE I 


Several Estimates of Mean of Normal Population with Efficiencies. (Variances to 
be multiplied by o*.) 


(Xi + X4)/2 


ts 
A 
> 


Bee \F 


gss 


PrP OWWWNHN NN 


8s 


~I 
a 
N 
8 
re 


s2e3 
>sessesso 
ies ims ee 
BSES 


ossoosoooon 


o 
~ 
_ 
© 


— — 
coooooore 


ESI5 
522222222282282828285 


.. 
—_ 


S 
3 


oocoooe 
ao, >» > 
ZSREE% 


~~ OOWDNAOOQAa Fr wWwWN 





| 0.375 
| 0.363 | 
143 | 0.350 
| 0.637 | | 0.000 





ESESRESESRESESSESESS 


oossosssssosoo: 














-e Ooo °O 


oo 
— 
o 





* A = Var (BLSS)/Var(X),.y0)- 


Since moments of the order statistics are now available for samples of sizes 
N S 20 from normal populations [3] it is a simple matter to find the variances 
of linear combinations of order statistics. The sample values are denoted 
XiS5X%,:5X%:;8:--- 5 Xv. 


1. Estimates of the mean. Table I gives the variance and efficiency of the 
following estimates of the population mean: (a) median, (b) midrange, (c) mean 
of best two, and (d) Xy:.~, = oie X;/(N — 2). The median and midrange 
are given primarily for comparison purposes, since results are well known. The 
median is defined as X.w41/2 for N odd and as 3(Xwyz + Xie) for N even. 
The mean of the best two (here “‘best”’ is used in the sense of minimum variance) 
is reported as the small sample equivalent of the estimate commonly used in 
large samples, the mean of the 27th and 73rd percentiles. It can be seen from 
Table I that for sample sizes larger than five, the particular ordered observa- 
tions indicated are not far from the 27th and 73rd percentiles, and efficiencies 
are close to the asymptotic efficiency (0.810). The efficiency reported for the 
mean, median, and the mean of best two is the ratio of the variance of the sta- 
tistic to the variance of the arithmetic mean. 

Estimate (d) above is the mean of all observations except the largest and 
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TABLE II 
A linear estimate of the standard deviation. (Variances to be multiplied by o’.) 


| 7 


Estimate 


& 


1.000 | 0.8862w 

0.992 | 0.5908 

0.975 | 0.4857w 

0.955 | 0.4299w 

0.933 | 0.2619(w + wy) 

0.911 | 0.2370(w + wy) 

0.890 | 0.2197(w + wy)) 

0.869 | 0.2068(w + wy2)) 

0.850 | 0.1968(w + wy2)) 

0.831 | 0.1608(w + wi) + wey) 

0.814 | 0.1524(w + wi) + way) 

0.797 | 0.1456(w + wi) + way) 

0.781 | 0.1399(w + wa) + wey) 

0.766 | 0.1352(w + we) + wey) 

0.751 | 0.1311(w + wey + we) 

0.738 | 0.1050(w + wi) + wiay + wis) 

0.725 | 0.1020(w + wi) + wis) + wis) 

0.712 | 0.09939(w + wie) + wes) + wis) 
0.700 | 0.10446(w + wi) + wey + W<6)) 


82888 


23 
~~] 
= I » © oo 


2 022 © 
© ~I I =I 
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* B = Var (BLSS)/Var(s’). 


smallest. Interest in this statistic arises when the extreme observations are 
poorly determined or not available. References [1] and [2] refer to this condition 
as doubly censored and develop best linear systematic statistics (BLSS) for vari- 
ous amounts of single and double censoring of the sample. The decrease in effi- 
ciency of the simple unweighted mean X);,~; compared to the BLSS based on 
the same observations is not great. In no case is the loss in efficiency more than 
1 per cent. This can be noted from the ratio Var(BLSS) / Var(X);,y;) given in 
Table I, since this ratio is never less than 0.990. It seems likely that for many 
applications, one could dispense with the use of unequal weights for the sys- 
tematic statistics in this case. It can be seen that the efficiency is not greatly 
affected by the use of coefficients differing greatly from the optimum. The 
column head “Eff.” is efficiency of X):,~, compared with the mean of all ob- 
servations and is approximately the same for the BLSS. 


2. Estimates of standard deviation. The efficiency of the range, w, as an esti- 
mate of the standard deviation in small samples, is well known. Similar esti- 
mates using additional observations will also give high efficiencies for larger 
sample sizes. Table II contains the efficiency of the range estimate compared 
to the unbiased estimate based on the sample standard deviation. The quantity 
k which satisfies E(kw) = co is tabled for reference. Let us denote the subranges 
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Xw-in1 — X; by ww and way = w. The unbiased estimate of the type 
s’ = k’ (>> ww), where the summation is over the subset of all wi) which gives 
minimum variance, is indicated in Table II. The column headed “Eff.” refers 
to the comparison with the unbiased sample standard deviation. The final col- 
umn gives the ratio of the variance of the best linear systematic statistic as 
given in [2] to the variance of s’. By examining this ratio we can see that the loss 
in efficiency due to the use of “zero or one” weights for each range rather than 
the optimum weights given in [2], is not great. 
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THE INDIVIDUAL ERGODIC THEOREM OF INFORMATION THEORY’ 


By Leo Breman 
University of California, Berkeley 


1. Introduction. Information theory is largely concerned with stationary sto- 
chastic processes --- 271, 2%¢,%1,-°** taking values in a finite “alphabet,” 
a,-°--:, @,. In addition, it is usually assumed that the processes are ergodic, 
that is to say, the shift operator 7’, defined on the sequence space © of the process 
by shifting each coordinate of a sequence once to the right, is metrically transi- 
tive with respect to the probability measure p on Q. 

A question of importance in information theory regarding these processes is 
the nature and existence, in some sense, of the expression 


(a) lim (- * log: p(Xo,***, .)). 


In 1948 Shannon [1] showed that for stationary, ergodic Markov chains (a) 
exists as a limit in probability and is equal to a constant. This limiting constant 
was termed by Shannon the “entropy” of the process. In 1953 McMillan [2] 
lifted the restriction to Markov chains and proved that if the process is merely 
stationary and ergodic, then (a) exists as a limit in Z,; mean and is constant. 
The purpose of this note is to prove that under the same conditions the limit (a) 
exists almost surely (a.s.). 


Received October 15, 1956. 


1 This paper was prepared with the support of the Office of Ordnance Research, U. 8. 
Army under Contract DA-04-200-ORD-171. 
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2. The modified Birkhoff theorem. The heart of the matter is the following 
modification of the individual ergodic theorem. 

THEOREM 1. Let T be a metricaily transitive 1 — 1 measure preserving trans- 
formation of the probability space (Q, ®, p) onto itself. Let go(w), gi(w), --- be a 
sequence of measurable functions on Q converging a.s. to the function g(w) such that 
E(sup; | gx | ) < ©. Then 


lim = os g.(T*w) = Eg as. 
Proof. We write 
1 n—1 . 1 n—l . 1 n—l . 5 
— 2) g(T'w) == Di g(T*w) + — 2 Ig(T*w) — g(T*e)). 
N k=O TN kad TN k=O 


The conditions of the theorem imply that FE |g} < « and by Birkhoff’s ergodic 
theorem (see, for example, [3], pp. 464-469), the first term on the right above 
converges a.s. to Eg. It remains to show that the second term converges a.s. to 
zero. Let Gy(w) = supizwy |gi(w) — oo then for every fixed N 


iim _* [9x(T*w) — g(T*w)] < lim - > | gx(T*w) — g(T*e) | 


im + ¥ Gy(T*s) = EGy as. 
T kmd 


The sequence {Gy} converges monotonically to zero and 
EGo S E(supx |g! + |g|) < ~, 


so by the monotone convergence theorem EGy — 0, which proves the theorem. 
THEOREM 2. Let ---,2%1,%, %,:°++* bea stationary ergodic process ranging 
over a finite number of values a, , --+ , a, . Then there is a constant H such thai 


lim (-1 loge p(t, ---, z,1)) = H as. 


Proof. Let 
go (w) = — loge p (x), 


P(t%, Tny1,°** » Xo) 
Won, Baas ***s 1) ? 


gx(w) = —loge 
Then, letting T be the shift operator, 
1 1 n—l k 
—— loge p(x, an Tn—1) - = > g.(T w). 
n N ked 
Since 7 is 1 — 1, measure preserving and metrically transitive, we apply The- 


orem 1 and our work will be done as soon as we show that the sequence {g,} 
converges a.s. and that E(sup; g.) < ~. 





A COUNTEREXAMPLE 
To do this we use the inequality established by McMillan [2], 


(i) / 9 = sim + 1)2™. 
(mSonr<m+l} 


We confine our attention to the cylinder set Z; C 2, Z; = {w; zx = a,;}. On Z, 
we have 


ge(w) = —loge p(ao = a; | 741,°++ , Z-+). 


Since p(xo = a;| 21, °*: , 2x) is a martingale, it follows from the convexity of 
—log and! inequality (i) that the sequence {g,} is a semi-martingale (see [3], 
p. 295). Therefore, g, converges a.s. on Z; and hence on 2. 

Furthermore, by a semi-martingale inequality, [3] p. 317, we have, on Z; , 


é e + 
f s 9 A 
[ (xp m)s +4 o OB" On) 


By using inequality (i) again, we bound the last term on the above right; 


[ Oe log* g.) = > [ (gn log* gz) 


mal ~Z5 (mS on<mtl} 


< 3 sm + 1) log (m + 1)2™. 


m=() 
Therefore fz, (sup, gx) < ©, by addition E(sup; gx) < ~, and the theorem is 
proved. 
It is a pleasure to acknowledge our debt to Professor David Blackwell who 
suggested to us the problem treated herein. 
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A COUNTEREXAMPLE TO A THEOREM OF KOLMOGOROV'” 


By Leo BremMAN 
University of California, Berkeley 


1, Introduction. In 1928 Kolmogorov [1] presented the now well-known de- 
generate convergence theorem (weak law of large numbers) as follows (see, for 


Received August 10, 1956; revised January 18, 1957. 

1 This paper was prepared with the support of the Office of Ordnance Research, U. 8S. 
Army under Contract DA-04-200-ORD-171. 

2 After this note was submitted the author was informed that C. Derman had con- 
structed a similar counterexample. While the note was in proof, a similar counterexample 
appeared in a paper by Hartley Rogers, Jr., Proc. Am. Math. Soc., Vol. 8 (1957), pp. 518-520. 
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example, Loéve [2]): let X; , X2, --- be independent random variables such that 
EX, = 0,k = 1, 2, --- , and let 


X; if |X| < 
X nk _ 
0 if |X,| = 
1 >, Xs Saas 0 
TL keel 
if and only if 


(i) > Px | > n) 0, 


(ii) + EXm — 0, 
Tr kewl 
Gi) + Yo 'Xu 50. 
NT” keel 
Presented without proof in the same paper was a sharpened version of the above 
theorem with condition (iii) replaced by 
Gi’) +O Ext, +0. 
n? pai 
A proof of this last theorem was given in Gnedenko and Kolmogorov’s 1949 
book and carried over into the English edition ({3], pp. 135-137). Unfortunately, 
the proof contains a slight gap and the sharpened theorem is not correct. Since 
it appears in several places in the literature, for example, in Loéve ((2], p. 278), 
and follows from Theorems 3.2 (p. 124) and 3.3 (p. 125) of Doob’s book [4] the 
following simple counterexample may be of interest to the reader: 


We will show that conditions (i), (ii), (iii’) are not necessary by proceeding 
as follows: define the independent random variables X, , X2, --- by 


P(X, = 0) = 1, 
P(X, = (—1)'h") =k, 
P(X, = (-1)° "R70 -— k*y) = 1 - Fk’, 
We verify immediately that EX, = 0, k = 1, 2,--- . Then we demonstrate 


that conditions (i), (ii), (iii) above are satisfied. Finally we show that, contrary 
to the theorem, 

l< 9 

— 2, EXn + 0. 

n? 2% , 


In the following proofs we take n = 4. 
Proor or (i). If k*® < n, then Xu = Xz, and if k S n, then 
k’?(a1 — k*)* < n. Hence 
< }°? <n 


P(|X;| 2 n) 
P ke? and ks 





A COUNTEREXAMPLE 


n n 1 
2 P(| Xs| > n) mee —— 0, 


5} k? 
where [-] denotes next higher integer. 
Proor oF (ii). 
fisre< 


(—1)***%"? if nsk and 


EXn = 


l< 1 = __4)\eriu2 
- >, BXu = Bes, 1), 


T kewl 


We use the inequality, valid for s > 1, 


1 
Ve -Ve=1< 


ty BXul s (2 Dts View 


N ml 2n 
Proor oF (iii). 
[ee +khi-—-k*)"* ff 25" <n, 
lea — k*)* —k if nsk** and 


oXu a 


For k 2 2, 
P+ki— mh s kb + $k s 2k’, 
kl —k*)" —-k=k"(1—k*)” 5 $k". 
Hence 
LF Kes oe 2k + — > 4s Stn”) + - log (n) + 0. 
o 3 “a is) 3k = = 


n? f= 


Finally, we show that 1/n’ (>°7_, EX%,) + 0. We have 


1 1 = 2 1 = —\~1 1 - 
n z EXns =a as nue - ee = oS Bsn bt, 
which completes the counterexample. 

It is a pleasure to be able to acknowledge our debt to M. Loéve, who brought 
the question to our attention and suggested further inquiry. We are also in- 
debted to R. K. N. Patell whose letter to M. Loéve was the cause of the re- 
examination of this theorem. 





D. A. S. FRASER 
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ON THE COMBINING OF INTERBLOCK AND INTRABLOCK ESTIMATES 
By D. A. S. Fraser 


University of Toronto 


In a recent paper Sprott [1] has considered methods for combining interblock 
and intrablock estimates of variety contrasts for incomplete block designs. The 
intrablock estimates are derived from treatment contrasts obtained within 
blocks. The interblock estimates presuppose that the block effects are random, 
independent, and identically distributed, and they are derived from contrasts 
among the block averages. Under normality the intrablock estimates are inde- 
pendent of the interblock estimates. 

Sprott compares two methods for producing combined estimates. The first 
method, introduced by Yates [2], is the familiar method of combining by weight- 
ing with the reciprocal of the variances, and is known to produce minimum 
variance when two real estimates of the same quantity are combined linearly. 
The second method, discussed by Rao [3] and Cochran and Cox [4], is to apply 
the method of maximum likelihood to the joint density function, and the result- 
ing estimate is linear in terms of the interblock and intrablock estimates. Sprott 
shows that, in general, the two methods are not equivalent. The second method 
is direct and has considerable theoretical weight behind its use. We are left then 
with the implication that one of the methods is incorrect for obtaining good 
estimates. In a sense this is not the case. Rather, one of the methods may be 
inappropriately applied. Weighting with reciprocal variances is appropriate to 
combining real estimates but if applied to vector estimates it ignores any co- 
variances and may not be optimum. 

Suppose z = (%,---,2,) and y = (y;, °°, yr) are independent estimates 
of the parameter » = (m,---, 7-) and have nonsingular covariance matrices 
V and W respectively. Also suppose, for the moment, that x and y are normal. 
Then, the joint density function is a constant times 


exp [—3(x — 9)V (x — n)’ — Hy — n)W'(y — 20)'1, 


Received October 29, 1956; revised January 10, 1957. 
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which can be factored as 
exp {[(zV" + yW"')(V"* + W")* — (V+ WwW’) 
(av + yW (V+ + WY) — oN} 


X exp {(z — y)(V + W) "(x — y)'}. 
Then, assuming that the covariances are known, we see that 


(2) (eV +yW")(V'+W")y* 


is a sufficient statistic for 7. Also, from its non-singular distribution, it follows 
that (2) is a complete statistic. Hence, by the theorems of Lehmann and Scheffé 
which may be found on pp. 61-64 in [5], an unbiased estimate of 7 based on (2) 
will be unique, and it will have minimum concentration ellipsoid among all un- 
biased estimates, liner or not. Each coordinate will also have minimum variance. 
From (1), we see immediately that (2), as it stands, is an unbiased estimate of », 
and hence has the properties above. If the assumption of normality is removed, 
(2) remains unbiased and has minimum variances among unbiased linear esti- 
mates. 

The question arises as to when the two methods are equivalent and the more- 
direct first method usable in place of the second. A combined estimate based on 
the first method would have the form 


(3) zD + y(I — D), 


where D is a diagonal matrix and J is the identity matrix (r X r). If (2) reduces 
to this form (3), then 


(4) Vv + wy =d+Wvy"* 
is diagonal and hence VW“ is diagonal: VW = D*. We have 
(5) V = D*W = WD*, 


where the second expression follows from the symmetry of covariance matrices. 
(5) implies that for any non-zero off-diagonal element in W the corresponding 
two coordinate indices have equal diagonal elements in D*. Therefore, by per- 
muting coordinates it follows that W can be made diagonal in blocks and that V 
is obtained by multiplying each block by a positive constant. Thus for the vec- 
tors x, y the rearranged coordinates fall into independent groups. A group for 
x and a group for y have the same covariance matrix except for a single scale 
factor. There are two extremes for this: first, the z and y have the same co- 
variance matrix except for a single factor; second, the z and y both have inde- 
pendent coordinates. 

Thus, in general, it is not enough to combine estimates, coordinate by coor- 
dinate. An estimate with optimum properties is obtained by weighting the 
vectors by the inverse of their covariance matrices. We can easily see that doing 
this for the incomplete block problem considered by Sprott will produce the 
estimates as obtained by the second method. For, the second method uses maxi- 
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mum likelihood on the joint density, and from (1) we see that this obviously 
produces (2). 
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ABSTRACTS OF PAPERS 


(Additional abstracts of papers presented at the Washington meeting of the Institute, 
March 7-9, 1957) 


1. The K-Visit Method of Consumer Testing, Grorce E. Ferris, General 
Foods Corporation, (By Title). 


When testing a pair of products for consumer preference the problem of how to treat or 
interpret no preference votes arises. A method is described of collecting data from a given 
number of consumers by repeated visits to them, or by obtaining repeated judgments 
from them in stores, which enables the estimation of the true proportion of those con- 
sumers in the population who have a preference for either product and of those who cannot 
discriminate or have no preference. For the model assumed, the maximum likelihood esti- 
mators of the above proportions are derived, their variance-covariance matrix is obtained, 
and a way of testing the appropriateness of the model is indicated. A decision theoretical 
formulation is suggested. (Received March 8, 1957; revised March 12, 1957.) 


2. Factorial Treatments in Group Divisible Incomplete Block Designs, CLiypE 
Y. KRAMER AND Rapa A. Brap.wey, Virginia Polytechnic Institute. 


Methods of incorporating factorial treatment combinations in group divisible incom- 
plete block designs are given. The factorial treatment combinations are so matched with 
the basic treatments in the association matrices of the designs that the sums of squares 
for the factorial effects can be obtained as functions of the original treatment estimators. 
It is shown first how a two-factor factorial may be incorporated into group divisible incom- 
plete block designs. Single degree of freedom contrasts are obtained for the effects in much 
the usual way as for factorials in complete block designs. Multifactor factorials and partial 
factorials are discussed, and a method of obtaining estimates and tests of significance of 
the effects is given. (Received March 18, 1957.) 


3. Iterative Experimentation, G. EK. P. Box, Statistical Techniques Group, 
Princeton University. 


Scientific research is usually an iterative process. The cycle: conjecture—design-experi- 
ment-analysis leads to a new cycle of conjecture-design-experiment~—analysis, and so on. 
It is helpful to keep this picture of the experimental method in mind when considering 
statistical problems. Although this cycle is repeated many times during an investigation, 
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the experimental environment in which it is employed and the techniques appropriate for 
design and analysis tend to change as the investigation proceeds. Broadly speaking, one or 
more of the following four phases can be detected in most investigations: (a) a screening 
phase in which an attempt is made to isolate the important variables; (6) a descriptive 
phase in which the effects of the variables and the positions of interesting regions of the 
space of the variables are empirically determined; (c) a phase leading from (6) to (d); 
(d) a theoretical phase in which an attempt is made to understand the actual mechanism 
of the process studied. The roles which statistical methods should properly play in assisting 
the iterative process at these various phases of experimentation were briefly discussed. 
(Received March 14, 1957.) 


(Abstracts of papers for the Atlantic City meeting of the Institute, 
September 10-13, 1957) 


4. On the Joint Estimation of the Spectra, Cospectrum and Quadrature Spec- 
trum of a Two-dimensional Stationary Gaussian Process, NATHANIEL Roy 
Goopman, New York University (introduced by Leon H. Herbach). 


The probability structure of a real two-dimensional stationary (zero mean) Gaussian 
process [z(t), y(t)], -» <t< ©, is specified (in the absolutely continuous case) by f.2(A), 
Swy(A) the spectral densities of the z(t) and y(t) processes respectively, c(A) the cospectral 
density, and g(A) the quadrature spectral density (—-» < A < «). The paper treats the 
problem of jointly estimating (in a suitable sense) fr. (A), fyy(A), c(A), @(A) from a finite part 
of a sample function of the [z(t), y(t)], —»° < t < ©, process. An approximation to the 
joint sampling distribution of the estimators for f..(A), fyy(A), c(A), g(A) is obtained. This 
approximate sampling distribution, termed a complex Wishart distribution, serves as the 
starting point in the derivation of approximate sampling distributions of estimators for 
functions of f.2(A), fy, (A), c(A), g(A). The paper was motivated by the need of experimenters 
in fields such as micrometeorology, oceanography, electrical engineering, and aeronautical 
engineering to statistically estimate ‘‘parameters” characterizing their particular physical 
systems and to treat the sampling variability of estimators for the “parameters.” In 
many cases the ‘‘parameters’’ to be estimated are functions of the densities f.2(A), fyy(A), 


c(A), g(A) of a real two-dimensional stationary (zero mean) Gaussian process. (Received 
March 29, 1957.) 


5. The Significance Probability of the Smirnov Two-Sample Test, J. L. Hopcss, 
Jr., University of California, (By Title). 


The approximation P for the significance probability P of Smirnov’s two-sample test» 
based on the asymptotic theorem published by Smirnov in 1939, has been shown (Drion) 
to be accurate when the sample sizes m and n are equal. For this case the error of P is of 
order 1/n (Gnedenko). Example: m = n = 12, P = 0.032, P = 0.034. A small numerical 
investigation shows that the accuracy is much poorer when m # n. Example: n = 12, 
m = 13, P = 0.024, P = 0.054. An asymptotic theory is developed forn— © withm — n > 0 
bounded. The error of P is now of order 1/+/n, and to this order is oscillatory as a function 
of P. This implies that no expression of the simple kind recently published by Korolyuk 
can be correct. An auxiliary table makes easy the computation of accurate estimates for 
P when m — nis small and m < 30. (Received April 8, 1957). 


6. Nonparametric Mean and Variance Estimation from Truncated Data, Joun 
E. Wausu, Lockheed Aircraft Corporation. 


This paper considers situations where a known number of the smallest values of a sam- 
ple and a known number of the largest values have been truncated. The problem is to 
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obtain an estimate of the population mean, an estimate of the standard deviation of this 
estimate of the mean, and an estimate of the population standard deviation. This paper 
derives a nonparametric estimate for each of these three cases. These estimates are approxi- 
mately valid for most continuous statistical populations of practical interest when a small 
number of sample values are truncated and the sample size is not too small. The mean 
estimate consists of a linear function of the ordered values of the truncated sample, while 
each standard deviation estimate is the square root of a quadratic function of these ob- 
servations. (Received April 10, 1957.) 


7. Distinguishability of Sets of Distributions (The Case of Independent and 
Identically Distributed Chance Variables.), W. Homrrpina, University of 
North Carolina, and J. WoLtrowrrz, Cornell University. 


Let 3 be a class of tests, based on a sequence of independent chance variables with the 
common distribution F (assumed to belong to a set F of distributions), for testing whether 
F belongs to one of two disjoint subsets, G and 3C, of ¥. We consider the cases where 3} is 
either the class of all tests which terminate with probability one if F ¢ F, or the class of all 
fixed sample size tests, or one of several classes intermediate between these two. The sets 
G and & are said to be distinguishable in 3 if, for every « > 0, there exists a test in 3 such 
that the error probability is <«for all F e G U 3. It is shown that if there exists a test in 
3 such that the sum of the maximum error probability in G and the maximum error prob- 
ability in 3C is less than 1, then G and & are distinguishable in 3. Sufficient conditions and 
necessary conditions for the distinguishability of two sets are expressed in terms of cer- 
tain distance functions. Certain simple necessary conditions for distinguishability are 
found to be also sufficient if the class of distributions is suitably restricted. (Received 
May 20, 1957.) 


8. An Extension of Box’s Results on the Use of the F Distribution in Multi- 
variate Analysis, Seymour GEISSER AND SAMUEL W. GREENHOUSE, Na- 
tional Institute of Mental Health. 


The mixed model in a 2-way analysis of variance is characterized by fixed classification, 
e.g. treatments, and a random classification, e.g. plots. Under the usual analysis of vari- 
ance assumptions the proper error for the fixed effect is the fixed X random interaction 
component, and the resulting ratio has the F-distribution. If we have individuals instead 
of plots as the random component and the treatments are correlated, then Box has shown 
that one may still use the same F-ratio as before as a test of treatment effects; however, 
the F-ratio does not have the requisite F-distribution, but it can be shown that it is dis- 
tributed approximately like an F-distribution but with modified degrees of freedom. Box 
did this for one group of individuals; the authors have extended the Box technique to g 
groups of individuals and give the modified F-distribution for the tests of treatment effects 
and treatment X group interaction effects. (Received May 24, 1957.) 
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NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


H. R. Bright, formerly Deputy Director of the Human Resources Research 
Office, George Washington University, is now employed as Specialist, Operations 
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Research and Synthesis, of the Circuit Protective Devices Department, General 
Electric Company, Plainville, Conn. 

The appointments of Dr. Samuel H. Brooks and Julian T. Anderson to the 
Scientific Staff of Technical Operations, Incorporated, were announced today 
by Floyd I. Hill, Associate Director of the research and development firm’s 
West Coast office. 

Loudon Campbell is now with the Advertising Department of Eaton Labo- 
ratories, Norwich, New York. 

Z. T. Chang, who has recently come to the United States, has joined Osborne 
and Thurlow, 39 Broadway, New York City. 

Alan Constantine has joined a theoretical research group at the University 
of Adelaide, Adelaide, Sth. Australia. 

Norman R. Garner has resigned as Consulting Statistician to the Quality 
Control Department of the Naval Powder Factory, Indian Head, Maryland and 
has joined the Reliability Control Staff of the Aerojet-General Corporation, 
Azusa, California. 

Mark L. Hinkle, Jr. is presently with Western Electric Company as a De- 
velopment Engineer at the Hawthorne Works in the Department for Mechaniza- 
tion of Equipment Engineering. 

B. 8. Kawar has been employed by Remington Rand Univac in St. Paul, 
Minn. to work with the group on “Programming Research and Development””’. 

Truman L. Kochler, Jr. has left the employ of Sylvania Electric Products to 
accept a position as an experimental statistician with the Operational Statistics 
Group of the American Cyanamid Corporation. 

Dr. R. G. Laha of the Indian Statistical Institute (Calcutta) has been ap- 
pointed to the position of a Research Associate in the Mathematics Department 
of The Catholic University of America. He will do research in probability theory 
and mathematical statistics. 

Dr. Eugene Lukas of the Mathematics Department of the Catholic University 
of America has been granted a leave in order to permit him to accept an invita- 
tion to give several lectures at the Sorbonne (Paris). 

Jacob Marschak is Professor of Economics and Research Associate of the 
Cowles Foundation for Research in Economics, Yale University. He is conduct- 
ing a project, under a contract with the Office of Naval Research, on Decision- 
Making under Uncertainty. 

Michael A. Martino, Jr. has accepted a position as mathematician at the Knolls 
Atomic Power Laboratory of the General Electric Company in Schenectady, 
New York. 

Albert Mindlin has been appointed Technical Assistant to the Chief of the 
Statistics Branch, Bureau of Old Age and Survivors Insurance, U. 8. Depart- 
ment of Health, Education, and Welfare. 

Sidney I. Neuwirth has assumed the position of Operations Research Con- 
sultant at Johnson and Johnson, New Brunswick. 

Romuald Slimak has recently been appointed Adjunct Assistant Professor of 
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Mathematics at the School of Engineering, New York University to teach com- 
puter methodology. 

Lt. (j.g.) F. Beckley Smith, Jr. is now serving on active duty in the U. 8S. 
Navy, stationed in Washington, D. C. 

Dr. M. D. Springer of the Naval Ordnance Plant, Indianapolis, has accepted 
a position as Senior Operations Analyst with Technical Operations, Inc., Fort 
Monroe, Virginia. 

David §S. Stoller is now a member of the Technical Staff, Computer Systems 
Division of the Ramo-Wooldridge Corporation, Los Angeles. 

Dr. David E. Van Tijn has accepted a position on the staff of Arthur D. 
Little, Inc. 


Irving Weiss has left the Mathematics Department of Lehigh University to 
join the Bell Telephone Laboratories, North Andover, Massachusetts. 


(a 


New Members 


The following persons have been elected to membership in the Institute 
February 6, 1957 to May 2, 1957 


Anderson, Norman H., Ph.D. (Univ. of Wisconsin), Postdoctoral Fellow, Social Science 
Research Council, Yale Univ. Department of Psychology, 383 Cedar St., New Haven, 
Conn. 

Askovitz, S. I., M.D. (Univ. of Penna.), Medical Statistician, Albert Einstein Medical 
Center, York and Tabor Roads, Phila. 41, Pa.; Chief of Tumor Registry, School of 
Medicine, Univ. of Penna., Philadelphia 4, Pa.; 4900 North 9th Street, Phila. 41, Pa. 

Belson, Irving, M.A. (Columbia Univ.), Statistician, Mine Safety Appliances Research Corp., 
Callery, Pennsylvania. 

Bentley, D. L., B.S. (Stanford Univ.), Student, Dept. of Statistics, Stanford Univ., Box 
2695, Stanford, California. 

Borden, Julien Louis, B.A. (U.C.L.A.), Grad. Student, Research Assistant, U.C.L.A., 
Los Angeles, California; 6626 W. 5th St., Los Angeles 48, California. 

Brock, Dan A., M.A. (Southern Methodist University), Analyst and Statistician, Dallas 
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AUSTRALIAN MATHEMATICAL SOCIETY 


During 1956 at a meeting in Melbourne 121 pure and applied mathematicians 
formed the Australian Mathematical Society. About a fourth of these are pro- 
fessional statisticians and the 1956 meeting featured a number of statistics ses- 
sions. The Council consists of T. M. Cherry (President), C. 8. Davis (Treasurer), 
T. G. Room (Public. Secr.), A. L. Blakers, H. S. Green, H. O. Lancaster, H. C. 
Levey, P. A. Moran, E. J. G. Pitman, A. H. Pollard, L. C. Woods. The General 
Secretary is J. P. Ryan (Math. Dept., Univ. of Melbourne, Carlton N. 3, Vic- 
toria, Australia). The next meeting will be held in Sydney on 28-31 August 1957- 
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